Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap: cannot allocate memory #4392

Closed
iDemonix opened this Issue Jul 17, 2018 · 57 comments

Comments

Projects
None yet
10 participants
@iDemonix
Copy link

iDemonix commented Jul 17, 2018

I've been running Prometheus for a while now, I've returned after the weekend to find it's died and won't start back up, here's the logs:

Jul 17 07:13:39 prom1 systemd: Started Prometheus.
Jul 17 07:13:39 prom1 systemd: Starting Prometheus...
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.004275608Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.1, branch=HEAD, revision=188ca45bd85ce843071e768d855722a9d9dabe03)"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.004470979Z caller=main.go:223 build_context="(go=go1.10.3, user=root@82ef94f1b8f7, date=20180619-15:58:53)"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.004547646Z caller=main.go:224 host_details="(Linux 3.10.0-862.6.3.el7.x86_64 #1 SMP Tue Jun 26 16:32:21 UTC 2018 x86_64 prom1.tc1.ifl.net (none))"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.004616789Z caller=main.go:225 fd_limits="(soft=1024, hard=4096)"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.007481818Z caller=main.go:514 msg="Starting TSDB ..."
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.009186238Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531389600000 maxt=1531396800000 ulid=01CJ797B7Z7B3AK51QAYKRJNA8
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.009405676Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531396800000 maxt=1531404000000 ulid=01CJ7G32FZN8NWMVD3WF9AF9ND
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.00956541Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531404000000 maxt=1531411200000 ulid=01CJ7PYSQZ9V42W59P2JG6A6VP
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.009730578Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531411200000 maxt=1531418400000 ulid=01CJ7XTGZZ8PFP6VD37KV0MQZH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.009865675Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531418400000 maxt=1531425600000 ulid=01CJ84P87ZHAS7X2S1E7801D51
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.009989572Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531425600000 maxt=1531432800000 ulid=01CJ8BHZFYWA1YBW41QZTPTXJT
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010109456Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531432800000 maxt=1531440000000 ulid=01CJ8JDPQXRV0QQY8YE3YBF893
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01022655Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8S9E00SENYPRCSJPG08A97
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010344493Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SBPH5GP9KYZVYMS3848YD
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010466488Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SDW6SJ9PD3NN8R0BW92HY
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010587861Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SG356WDNJF9MT8ZZ215JN
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010732419Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SJ82SD88JDGY9R7SWW9J5
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010856969Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SMCK7X83BHE6YP62T784W
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.010979213Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SPHD9FF28VHG7J5XQ2NVH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.0110986Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SRPDMCK99196W2YC4CE21
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.011218371Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8STT1X2HN1XATJM86QWPSA
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.011337804Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SX00A0EPCFKDART7QKQR9
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.011477018Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8SZ4R367QSYGCVQ96VYX8R
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.011604753Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8T19MVEXNJZNPM6973YZBS
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.011752903Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8T3DS40F7NPJ7WEVK3M3NH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01187759Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8T5J8C6KBPS65Y2HT6YZJ3
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012014197Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8T7QSND4YTAFGA5Z70WSG3
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012135081Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8T9VXC66YZZHS8V3ZDCR4X
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012270478Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TC07TSVR794FE7R067XTB
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012405152Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TE5A6TXN57MV5D789X9QJ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012534753Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TG9H4BJFC4FYCSJ61YY3S
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012704593Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TJERJXY5CDBMWVFX30ZFD
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012862264Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TMJXQ91A2EXQS1EM79E5X
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.012988194Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TPQJBNJ2WJA5P749J0TST
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013115532Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TRW2ZHXSP95TRCYDEZEZV
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013243196Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TV2DB1W823JT1GZBSQGSH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01336626Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TX5X9601996V7TPCZ8VK6
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013482807Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8TZB7YED7K5K7HPF9823NE
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013606221Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8V1FQM94W3RF2DS31YJE1P
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013751122Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8V3K6X4J0P9NPS371R2FCK
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013873608Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8V5RZ0M3093M4ZEGVXJW8E
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.013997233Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8V7XBH17RV8S4YS3AC6XTD
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.014118046Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VA1KJ0PSHXFB8XZAG3VKW
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01425004Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VC4WQ72RQDRARPX3277FS
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.0143991Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VE8KF9GCEEN4NXMXTCN8G
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.014521771Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VGCZQWY52K1XY6V220001
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.014691192Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VJJXA4Q4TT7PG7MFHKAZH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.014826099Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VMQW2GJSRAYMZDVK3KTJT
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01494562Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VPVY84Z63X0SYRXE7JZ89
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01506297Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VS07M1E02WE1PQXT15J58
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015186751Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VV5XWWQBJG3CA4C39SQWR
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015310031Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VXAP5BZZPVSSE4QQYT3HP
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015430352Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8VZE93K91N7WT48A6EDMD3
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015548619Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8W1KC3BA1RY6NZHV6FG8S2
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015686243Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8W3QNCGJKCDDE18A7V5EWH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015813956Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8W5W8VDZ4C4B16V9XWA0Z2
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.015932731Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8W82EHB2BGQ6B6V7BR5W78
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016065934Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WA80QVZ4BHRJCJFH2PADR
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016198105Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WCC2RFW06JDFMNFY0RBSC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016325888Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WEK2BZ4371C3DWMNZRR4S
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016462289Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WGQXFGQ928PHXBBXAWFES
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016585286Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WJX8DPY5XS07RP3HCT4KZ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016725857Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WN1A219T5K21MQ1C57RMN
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016862418Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WQ62AY25ZHFWB74G4129R
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.016983421Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WSBEG9S1A90YGBBZCH2DF
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.017100152Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WVFAZDNCWAB25RKNVRHJ3
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.017217152Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WXM3MPSXFKQCXPKFFN230
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.017343067Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8WZRDJ22GVD3EZ0DSET8AP
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.01747831Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8X1W7GV4C021YHDDTVBFR0
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.017600328Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8X41G9RSVCQ5XERWEHWV53
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.017722908Z caller=web.go:415 component=web msg="Start listening for connections" address=0.0.0.0:9090
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.020550136Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8X67BWPANXZ37CVTX7RY2E
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.020745437Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8X8BTQQHTPY6YQRKXXCZ5Q
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.020883695Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XAFM8TNHBBE0B7FDZ8H4M
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021013902Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XCMX76XDE7FQ0ZV6TX5K9
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021145646Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XESDFZ7JH92YGR36T8BH0
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021266366Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XGX4M2WBP1E012S4QWF65
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021384367Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XK3NN8524XHPHZH0G4VWB
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021531994Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XN82PQ50W4T8ZV486X1VR
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021697185Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XQD5GVJ86MT1CMSEQS5SK
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021832989Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XSH4D843QBEJRJNVQ8TEC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.021980446Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XVN8WQVXARHHPT68CZ51B
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.02211064Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8XXSXT4RDEK5QZ43QT5GMQ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.02223101Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Y00XPWTCCMX232DC4CNSB
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.022352174Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Y2602B8Y2GWPDFBPJD597
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.022470011Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Y4AE9R5HWQPCW4BN5H6BC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.022607625Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Y6G7PHF953DYVN649GX3X
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.022763642Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Y8P378DD22GWZJS28VNN4
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.022889246Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YAW901FMEYNKBBG948CHX
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023025047Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YD39EAY54NKCKKZ1NTMHN
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023149634Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YF9WAE6EC2NG78C309GVQ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023273892Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YHG4MR7H7A2RF4YYTZQNJ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023391268Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YKN0G3HMNFQNMJYB41QMR
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023508413Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YNVSJQ176XY2E2KQXFJEV
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023630133Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YR0QMDQ6ZGGS1H9Z2FEPH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023779074Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YT79GA1EKD6AYTV0NA34E
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.023905054Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YWCG437S850DW1R62DA8M
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024025218Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8YYHT9E1EJWZ9R7BK6A97Y
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024143815Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Z0RSYXKBDNF09YK41YDR9
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024266126Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Z2YJARMXCH5KSZN197K07
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024398432Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Z558EE3K4XN6ZWM12KC45
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024521883Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Z7BP2SSH91YP0473XQJ9D
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024638383Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8Z9JA99BR3SC7TJJ33XCSC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024791711Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZBSSFDFP5JDEBAM00H9BH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.024910215Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZDZZ7K147E9A1HG8NCGJY
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025032179Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZG4V8YXQGVHPQM0EQX2ZH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025165449Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZJB61QXFMNQM55QSQNJKM
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025292046Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZMJ76TJWY390WRY6AFR23
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025416667Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZPQY0ZAVMPW72JRRJD3S6
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025535567Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZRWA2GDHQWF6FTWGM3NYT
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025674638Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZV2VNGC3BX0CCFC1TWFP5
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025801815Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZX9JVWYWJWP0Z8JNJNGFZ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.025940429Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ8ZZDK6387W6CK5WEV6MMBF
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.026065386Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ901KJMCYCSZJPCJC1M2Y13
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.026188487Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ903RR1HZKHAKXPMVNE20NM
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.02631239Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ905XPPSKWWHR578H2DNKNB
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.026429905Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90832N2F4FZS873SYPM7PF
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.026564821Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90A8M5FKWQHT5P4AHX2R59
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.027281211Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90CF567N3ZYJG6JC3C0SV9
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.027435762Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90ENH0VJ9AGCBZ94TEQVVW
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.027577883Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90GVBS4MPXE2VN8D2P7N13
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.02773114Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90K0A571CZ86VAGZFR6ZNA
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.027908581Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90N5CPJ7KQRAP9RXVRWR6T
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028041112Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90QC56NVA576AWRHTA30YD
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028161848Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90SK99F52F1ZNK2E9A2NXX
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028286026Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90VSMJS2VXR4V2Z2Y8W58Q
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028405396Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ90Y0DHXD969VRDFVERN47N
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028521314Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ9106ME16TG1H37C8WQDZAK
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028638167Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ912E34BQZ0XX8K0P1XHPRX
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028787301Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ914N4QQVYW46ZYX7MJWES5
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.028912008Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ916V92S5ZQ4JM5XEYH93HQ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029056482Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ9190XP1KZPA7DZ8M4HCK08
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029185393Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91B6J20QHTRN40VQ9BJ5W4
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029319747Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91DEJQ7WS1CMGRRMDKDFXH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029440991Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91FNRAX43DKEX3Q43SS08Q
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029558478Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91HWCYP2KTSZKCW7GVCNHP
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029704885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91M30DYRENA3ER84GG0YTK
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.029846475Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91P98EFMMFWX0C2DDR37HY
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.02998615Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91RG5V6WRGBRPZ4PFHG05C
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.030111813Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91TNDB0P2AHVDGGX13Z5S0
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.030230811Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91WVCXPW7VEFYJQPRDEZVZ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.030353754Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ91Z16XT7FXHZXVJ1239T12
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.030472522Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ9216VTWCPAQCGQX38MTWP1
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.030589882Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ923CDRXAREPD9R99C7MY7D
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.030732933Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ925J5PAJEYNJ3FM61HWNXB
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03087914Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ927S937STZXXBMVMZY35KF
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03100931Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ929YT4KMD8X7N0MZP1DNTW
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.031144548Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92C4XFVP1JPCH85V7TC72B
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.031262841Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92EBMSB4E950TTA82ACNSX
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.031379546Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92GH4N6F81DJ74420A1TCW
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.031515982Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92JPPJDGMBH72XKA97H3DM
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.031639743Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92MTX2J62Q8JAQY2TC59XT
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03178763Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92PZRFBTZ1RTXWQ0RPGADD
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.031907894Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92S5KK49YSACZAVXYYRV0T
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032045501Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92VAWMCMJJ76AWMWXZANPJ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032177392Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92XFAATAQG45M10PCS215G
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032294889Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ92ZPD5KN6YPJDCQT1V71HZ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03241155Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ931W7DEXEW4KRKAV1CM83M
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032528617Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ9341TB3XV4YP929MV3PK8J
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032712054Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ9368XS0AQWJXYGWSKNZQ5H
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032842498Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ938E4GDPDQF9MBYTGV8JXK
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.032972822Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93AKGZHMMJGTKN044BGWDY
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.033110199Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93CT4ATEE15XRWTJMC0GAC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.033231793Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93EYB46X8BDHB9C76VYCSB
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.033974033Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93H4NM8VW09SBWKVDJBQ6T
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.034328154Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93KBYCNA6GBB8VPBE488VG
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.034600662Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93NMWRG7Y6D1S5472Y79D4
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03485032Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93QWTESJDGFSZ1ZBN1RDYG
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.035066851Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93T413NCAN66XZ8CK43NX0
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.035281258Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93WA3J9HPVKATY6CV47PXP
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.035478586Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ93YGWX5EMT0K0K5Y66PFCJ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.035699534Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ940RBGCCXAACV1AA4GRZ4D
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.035895921Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94305QAH7HHE2F27ZGZV7H
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.036117779Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ9457NCJ32A1BFN3E8TG005
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.036321159Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ947EC4XPS8REQ01C8T2EGD
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.036514997Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ949KVS16132A2JTYWC99E0
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.036727511Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94BSHZ4GE4HVK60FQEJ5GY
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.036932249Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94DZ4AE2J8FZZZ89NMEXK8
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03714964Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94G45BFKFCX9VP924CP2ZM
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.037355447Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94JAGAJ57WMKH80N6GB08H
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.037547908Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94MGCWNCNASEJRTYZB5G69
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.037797429Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94PQTKJ6QNEGW5N5J8ACRP
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.037995063Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94RY745991Z6T50YTBQ671
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.038190227Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94V3GSMS41EKX12Z37PEHN
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.038372625Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94X8028JKKM59XFT0MK7SG
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.038571216Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ94ZEVYKTNFVFPPVE3P8PPE
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.038789477Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ951M7774T4S4651WXHPKTA
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.038979601Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ953XHZK9TH3PC2GAM6R0CW
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.039162151Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ956567TH60GAY0S063MDNH
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.039347895Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ958C0Y64YC23523FJ0PRCE
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.03952188Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95AJCJNHJST0XP3DW4SSWC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.039717307Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95CSTQ7ZXC54B4VCJMYFJC
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.039907898Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95EZY469N9N2TXPA4WBES9
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.040105682Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95H8HM0VCK5VAXY0DB0FWJ
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.040293133Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95KES3CEVZFPFESDZ3SY4X
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.041087803Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95NPGTTCQ4Y213ZAC4GZRW
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.04128544Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95QYJWCEWMHP66SXFK09V9
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.041481374Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95T53Z9VTB0NP0APKZGH6K
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.041694339Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95WCDGC16B2Y4XQ8122PD3
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.041876966Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ95YKDA1PZ88M1W0KV6SN35
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.04205036Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ960VF316XZK81A02MBN9V1
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.042240061Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ96344KQNRNDNNEVBZKG6BE
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.042418102Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ965AG8WY7PQ1B6MWDKDGJE
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.042608263Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ967JFSWXCAJKPGCBYWDM03
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.04283245Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ969S8AZV5BQ8QC6NBPAM6C
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.043013837Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ96BZH0WQX1Y6DA4GVNDA3J
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.043190621Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ96E6RN9BH8V2TETZKD9R4Q
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.043369452Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ96GCVZWZWQHA29573EG97P
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.043567447Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ96JJ6TN53EVHM7E487MMR7
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.043765411Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531440000000 maxt=1531447200000 ulid=01CJ96MQW6WSCW9Q2FFGDWGD7K
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.302955146Z caller=main.go:402 msg="Stopping scrape discovery manager..."
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303051627Z caller=main.go:416 msg="Stopping notify discovery manager..."
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303092483Z caller=main.go:438 msg="Stopping scrape manager..."
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.30314706Z caller=manager.go:464 component="rule manager" msg="Stopping rule manager..."
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.3031915Z caller=manager.go:470 component="rule manager" msg="Rule manager stopped"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303234454Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303303588Z caller=main.go:398 msg="Scrape discovery manager stopped"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303341808Z caller=main.go:412 msg="Notify discovery manager stopped"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303525922Z caller=main.go:432 msg="Scrape manager stopped"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303588702Z caller=main.go:588 msg="Notifier manager stopped"
Jul 17 07:13:40 prom1 prometheus: level=error ts=2018-07-17T07:13:40.303624619Z caller=main.go:597 err="Opening storage failed open block /var/prometheus/data/01CJ8SPHD9FF28VHG7J5XQ2NVH: mmap files: mmap: cannot allocate memory"
Jul 17 07:13:40 prom1 prometheus: level=info ts=2018-07-17T07:13:40.303888543Z caller=main.go:599 msg="See you next time!"

The server has plenty of free memory both in terms of disk space and RAM/SWAP. If I temporarily remove the block it complains about to /tmp, it just moves on to the next block and has the same complaint, any advice?

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jul 17, 2018

Thanks for your report.

This message comes form the OS so can't be bug in Prometheus.
So I suggest doing a google search with the message "mmap: cannot allocate memory" and it should give you some clues.

Another option would be to move the question to our user mailing list.

If you haven't looked already you might find your answer in the FAQ
or the official docs and examples or by searching in the users or devs groups.
The #prometheus IRC channel is also a great place to mix up with the community and ask questions (don't be afraid to answers few while waiting).

If you think this is not purely a support question, feel free to comment in here or in the dev mailing list.

Once your questions have been answered, please add a link to the solution to help other Prometheans in trouble reaching this from a search 👍

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 17, 2018

Thanks @krasi-georgiev. I couldn't solve this one, my solution in the end was just delete the data directory and start again which worked straight away. Luckily I've only been running for a few months and it's not live yet, but I'll probably now look at federation.

@marcelmay

This comment has been minimized.

Copy link

marcelmay commented Jul 20, 2018

Just happened to me, too - on a large 256G memory machine, where Prometheus takes <12G memory. There was > 200G free memory (buffers, cache) available, but Prometheus (v2.2.1 or latest v2.3.2) refused to start.

As noted above, the workaround was to create a new, empty data directory to get Prometheus up running.

It seems to occur to more users - eg #4168 : mmap fails and free memory :

...still have 30G free memory

@krasi-georgiev :
Is there a linux kernel setting required mmap-ing lots of large files, as required by Prometheus?

When starting Prometheus, we noticed RES memory < 10G but VIRT memory sky-rocking.
It looks a bit as it hits the process virtual memory limit ... unfortunately I was not able to to try a ulimit -v unlimited due to missing privileges.

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 20, 2018

I'd also be interested in any performance tweaks that could help, but this instance is only monitoring a few hundred hosts.

I've also noticed I can tell when it's going to crash, as the directories in /var/prometheus/data start appearing minutely instead of at intervals:

drwxr-xr-x. 3 root root 4096 Jul 19 15:00 01CJSGW3DB6EWY5KCCJHT5H007
drwxr-xr-x. 3 root root 4096 Jul 19 17:00 01CJSQQTNCHP5NRBX4BPAGAB0V
drwxr-xr-x. 3 root root 4096 Jul 19 17:00 01CJSQRDVQXHXZBJ4HC6JTAYJT
drwxr-xr-x. 3 root root 4096 Jul 19 19:00 01CJSYKHX9R98QZ380XS9RQV3G
drwxr-xr-x. 3 root root 4096 Jul 19 21:00 01CJT5F95D5X9439W6MG03PMBS
drwxr-xr-x. 3 root root 4096 Jul 19 23:00 01CJTCB0DF31VRVCYT6YZXP99K
drwxr-xr-x. 3 root root 4096 Jul 20 01:00 01CJTK6QND324ZF1X2J90EH0DP
drwxr-xr-x. 3 root root 4096 Jul 20 03:00 01CJTT2EXDD589YEMWPGMP7GQG
drwxr-xr-x. 3 root root 4096 Jul 20 03:01 01CJTT4PG2K3QBAQGJ5YEWGPQP
drwxr-xr-x. 3 root root 4096 Jul 20 03:02 01CJTT6XMF5EV0N6PPKBV2KJES
drwxr-xr-x. 3 root root 4096 Jul 20 03:03 01CJTT928YW83MZBZXZ6X8GVV7
drwxr-xr-x. 3 root root 4096 Jul 20 03:04 01CJTTB7WFSTM7RVE4HCFGPFAY
drwxr-xr-x. 3 root root 4096 Jul 20 03:06 01CJTTDD3TRTEN4HB5SP7WN6S4
drwxr-xr-x. 3 root root 4096 Jul 20 03:07 01CJTTFJ00GHH5ZT7WK4WE6V8N
drwxr-xr-x. 3 root root 4096 Jul 20 03:08 01CJTTHQVYX6G9J34ZVKEW4D8V
drwxr-xr-x. 3 root root 4096 Jul 20 03:09 01CJTTKXBDAP5N2DZX9KH5BRDW
drwxr-xr-x. 3 root root 4096 Jul 20 03:10 01CJTTP2D9E5FJN43TTBN7ZD18
drwxr-xr-x. 3 root root 4096 Jul 20 03:12 01CJTTR805V1TKN4WNBKSFB6FQ
drwxr-xr-x. 3 root root 4096 Jul 20 03:13 01CJTTTDJZ6RNYRHSDBAVTZ54G
drwxr-xr-x. 3 root root 4096 Jul 20 03:14 01CJTTWHK99QT774D9D20525RW
drwxr-xr-x. 3 root root 4096 Jul 20 03:15 01CJTTYQ76AW76P0WWJ3F20K5G
drwxr-xr-x. 3 root root 4096 Jul 20 03:16 01CJTV0XRNSH4E1WPFAK3TW429

The host definitely isn't running out of RAM, it's a VM on an ESX cluster that isn't showing any health issues across the other VMs. I'll keep trying things, I may have to build a server directly on tin to try and narrow down the issue...

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 21, 2018

Reopening this because it seems it was closed prematurely. Hitting an mmap limit like this is probably still a Prometheus problem, especially if the machines have plenty of available memory otherwise. Maybe the TSDB tries to mmap too many blocks at once at startup or something like that?

@juliusv juliusv reopened this Jul 21, 2018

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 21, 2018

@marcelmay Have you had a look at http://mroonga.org/docs/faq/mmap_cannot_allocate_memory.html and increasing the vm.max_map_count limit? What's its current value? You could also strace Prometheus at startup to log all the mmaps (with sizes) it does for further insight.

@marcelmay

This comment has been minimized.

Copy link

marcelmay commented Jul 21, 2018

@juliusv : Thx for the hint.

We tried that, starting at the default 65k and doubling it till 512k.

As Prometheus worked nicely for at least 2 months, we stopped doubling it (thought it would require only small increase from the default). Interesting thing we noted was that Prometheus startup bailed out at different mmaped files whenever we doubled, so it looks like the param has an impact.

I will try to increase vm.max_map_count again on monday, and will also try the ulimit -v unlimited.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 21, 2018

still a Prometheus problem, especially if the machines have plenty of available memory otherwise.

Previous research into this indicated it was a kernel problem, and we're not the only application to hit it.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 21, 2018

still a Prometheus problem, especially if the machines have plenty of available memory otherwise.

Previous research into this indicated it was a kernel problem, and we're not the only application to hit it.

If this has indeed been determined to be a kernel problem already, it's a good idea to link to the previously gained insights about the problem. Is it just about too low vm.max_map_count values, or do certain kernel versions have a bug? Or is there something in the TSDB that can be improved to not over-mmap? Without knowing more context, this error can totally plausibly be a Prometheus issue of mmap-ing too much at the same time.

There's been an equivalent issue that was also closed by just saying that the machine doesn't have enough memory, although reportedly there was enough memory available in that case too: #4168 (comment).

@marcelmay

This comment has been minimized.

Copy link

marcelmay commented Jul 24, 2018

Increasing the ulimit (-v) for virtual memory to a value greater than the data size did fix the issue for me.

Running top while starting Prometheus (v2.3.2) did show how the VIRT memory increases during startup up to about the data size (~500GiB), while resident memory was around 10GiB.

The previous virtual memory limit when Prometheus bailed out with mmap: cannot allocate memory was around 200GiB.

From setrlimit :

RLIMIT_AS
This is the maximum size of the process's virtual memory
(address space). The limit is specified in bytes, and is
rounded down to the system page size. This limit affects
calls to brk(2), mmap(2), and mremap(2), which fail with the
error ENOMEM upon exceeding this limit. In addition, auto‐
matic stack expansion fails (and generates a SIGSEGV that
kills the process if no alternate stack has been made avail‐
able via sigaltstack(2)). Since the value is a long, on
machines with a 32-bit long either this limit is at most
2 GiB, or this resource is unlimited.

Unfortunately, Prometheus ran later into #4388 - but that's a different issue.

Anyway, maybe Prometheus could print out a hint about the virtual memory limit (if someone else can confirm the fix) when mmap fails/when Prometheus notices that it reaches that limit?

Final note: I have not reverted my increase of vm.max_map_count back to the default , so I'm not sure if increasing the value was also required. As previously commented, increasing this parameter by 8x did not solve the mmap error for me.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 25, 2018

Thanks a lot @marcelmay for the very detailed explanation. I agree that it would be good to print out the vm limit on startup like we do for fd limits.
@iDemonix does it solve your issue?

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 25, 2018

Thanks for the explanations and advice, I will try the suggestions and get back @simonpasquier

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 30, 2018

Even with ulimit -v unlimited I've come in after the weekend to find the same issue. I modified the vm.max_map_count to 10x its value but no dice. This VM is built in a different data centre to the one I had at the start of the issue, on different hardware with different specs. Has anything changed in relation to mmap in the last few minor releases?

Again, removing the data directory allows it to start back up. I've been moving to Prometheus to get away from Icinga2, but I'm struggling now as a fresh VM with enough memory seems to corrupt my data every few days.

I can see some mentions of TSDB work in 2.3.2, I could run an earlier version in one DC and 2.3.2 in the other to see if one outlives the other? Running out of things to try and I need this to go in to production soon.

Would be interested to hear others TSDB settings. My max block duration was 6h, but I've left that to default now and it goes to 9d, here's my settings:

storage.tsdb.max-block-duration 9d
storage.tsdb.min-block-duration 2h
storage.tsdb.no-lockfile    false
storage.tsdb.path   /var/prometheus/data/
storage.tsdb.retention  90d
@veox

This comment has been minimized.

Copy link

veox commented Jul 30, 2018

@iDemonix I have found that it may not be necessary to delete the entire data directory; removing "just" the ones that are "too frequent" (as in your earlier comment) would work.

I've done this on one "collector" machine with

find ./ -maxdepth 1 -type d -ctime 2 -exec sudo rm -rf {} \;

This left a couple of those "too frequent" block directories, but prometheus was able to start anyway (after some hurdles high with iowait).


(Below not likely to be the same case as @iDemonix.)

There's nothing that monitors the machine other than itself (oops!..), and it seems that it was writing unusual amounts of data to disk (/home/prometheus/data in my case) before it crashed, and was unable to restart:

(I might've accidentally deleted a few blocks manually - oops, again.)

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 31, 2018

Happened again overnight, I'm not sure whether to roll back a CentOS minor release, or roll back a couple of Prometheus versions, as both have been upgraded in the last month, and I can't seem to find any way to run Prometheus for more than 48h now.

[root@prometheus data]$ls -lah
total 944K
drwxr-xr-x. 233 root root  16K Jul 31 06:30 .
drwxr-xr-x.   4 root root 4.0K Jul 30 09:07 ..
drwxr-xr-x.   3 root root 4.0K Jul 30 14:00 01CKNMC9BTMBNXZ62W5ECMQY1V
drwxr-xr-x.   3 root root 4.0K Jul 30 16:00 01CKNV80KSP1GVT30T2G3A8Q3V
drwxr-xr-x.   3 root root 4.0K Jul 30 18:00 01CKP23QVQPASM3D5GRWSTRS0A
drwxr-xr-x.   3 root root 4.0K Jul 30 20:00 01CKP8ZF3SB3W2Y27MKF3V2BFH
drwxr-xr-x.   3 root root 4.0K Jul 30 22:00 01CKPFV6BSJBRSE0D75D7VXNKC
drwxr-xr-x.   3 root root 4.0K Jul 31 00:00 01CKPPPXKRTTZGBGGS83S6YDPQ
drwxr-xr-x.   3 root root 4.0K Jul 31 02:00 01CKPXJMVRXCTC5Z4F383R4E6K
drwxr-xr-x.   3 root root 4.0K Jul 31 02:01 01CKPXMWW499SF3JKZEZV62W6G
drwxr-xr-x.   3 root root 4.0K Jul 31 02:02 01CKPXQ1NAHYYEAZDS70BMTHXT
drwxr-xr-x.   3 root root 4.0K Jul 31 02:03 01CKPXS70NRDPQ8BRC1ZCT1VSC
drwxr-xr-x.   3 root root 4.0K Jul 31 02:04 01CKPXVC976WCNSB1F3E57W4RY
drwxr-xr-x.   3 root root 4.0K Jul 31 02:06 01CKPXXK52GABFC4X3ZY4AH74V
drwxr-xr-x.   3 root root 4.0K Jul 31 02:07 01CKPXZS5PR5W8J63TCC12T3BQ
drwxr-xr-x.   3 root root 4.0K Jul 31 02:08 01CKPY1ZPTD79CQHTA2C0RST86
...
[output truncated]

After running:

find ./ -maxdepth 1 -type d -newermt '7/31/2018 02:00:00' -exec sudo rm -rf {} \;

I can start Prometheus again, albeit with missing data. Why does Prometheus start segregating data at 1-minute intervals, when the min size configured is 2 hours? Would it not be possible to throw an error or restart (some part of, or all of) the service?

If I catch Prometheus doing this with the /data directory, I can stop the process, remove the broken files and restart without an issue, during all of this I don't run out of RAM, and I don't see a HDD being filled up rapidly (there's 480GB free).

Update: I usually just erase data/ and start again, this time I ran the above command, started back up and I'm getting:

Jul 31 08:35:37 prom2 prometheus: level=error ts=2018-07-31T07:35:37.174652356Z caller=db.go:272 component=tsdb msg="compaction failed" err="persist head block: write compaction: iterate compaction set: get series 4284: not found"
Jul 31 08:36:39 prom2 prometheus: level=error ts=2018-07-31T07:36:39.860733621Z caller=db.go:272 component=tsdb msg="compaction failed" err="persist head block: write compaction: iterate compaction set: get series 4284: not found"
Jul 31 08:37:42 prom2 prometheus: level=error ts=2018-07-31T07:37:42.585802525Z caller=db.go:272 component=tsdb msg="compaction failed" err="persist head block: write compaction: iterate compaction set: get series 4284: not found"
Jul 31 08:38:45 prom2 prometheus: level=error ts=2018-07-31T07:38:45.084885257Z caller=db.go:272 component=tsdb msg="compaction failed" err="persist head block: write compaction: iterate compaction set: get series 4284: not found"
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 31, 2018

My max block duration was 6h, but I've left that to default now and it goes to 9d

In general we recommend not to modify the min/max block duration settings. They're mostly used for benchmarking and testing.

Having block directories that are created every minute or so seems to indicate a bug with the compaction. If the faulty server is still on v2.3.1, can you upgrade to v2.3.2? It includes several tsdb fixes that may be related to your issue.

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 31, 2018

Hi @simonpasquier, I'm running 2.3.2 and 2.3.1 on two separate servers now, with the same specs. I'm going to see which one lasts longer and will report back. The last few crashes have actually been from the 2.3.2 version, I believe (although starting to lose track!)

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 31, 2018

@marcelmay @veox would you mind sharing your Linux flavour + kernel version?

I'm trying to figure out what's changed. I ran Prometheus for 2-3 months without issue, but unfortunately a bad hardware failure lost that VM entirely. Since repeating the exact same build steps with the same config, I hit this mmap issue constantly. As I lost my original VM I can't check exact versions, but I believe there's been some CentOS 7 and Linux kernel updates since, so I'm starting to think about trying a kernel downgrade on a test box. My VMs are built by Puppet so they're identical in build, all I can think that has changed is the Linux Kernel version, and I may have been running Prometheus 2.2.1.

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 31, 2018

Some more logs below from my 2.3.2 install which was creating repeated directories in /var/prometheus data:

Jul 31 12:46:35 prom2 systemd: Started Prometheus.
Jul 31 12:46:35 prom2 systemd: Starting Prometheus...
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.941095276Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=HEAD, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.942348553Z caller=main.go:223 build_context="(go=go1.10.3, user=root@5258e0bd9cc1, date=20180712-14:05:26)"
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.94246377Z caller=main.go:224 host_details="(Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 prom2.th1.ifl.net (none))"
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.942519141Z caller=main.go:225 fd_limits="(soft=1024, hard=4096)"
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.94422615Z caller=main.go:533 msg="Starting TSDB ..."
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.944649108Z caller=web.go:415 component=web msg="Start listening for connections" address=0.0.0.0:9090
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945200696Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532944800000 maxt=1532952000000 ulid=01CKNMC9BTMBNXZ62W5ECMQY1V
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945413397Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532952000000 maxt=1532959200000 ulid=01CKNV80KSP1GVT30T2G3A8Q3V
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945544094Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532959200000 maxt=1532966400000 ulid=01CKP23QVQPASM3D5GRWSTRS0A
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945635161Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532966400000 maxt=1532973600000 ulid=01CKP8ZF3SB3W2Y27MKF3V2BFH
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945732125Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532973600000 maxt=1532980800000 ulid=01CKPFV6BSJBRSE0D75D7VXNKC
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945828262Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532980800000 maxt=1532988000000 ulid=01CKPPPXKRTTZGBGGS83S6YDPQ
Jul 31 12:46:35 prom2 prometheus: level=info ts=2018-07-31T11:46:35.945913055Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532988000000 maxt=1532995200000 ulid=01CKQK65JTYDK045MXX74BCC5K

# Prometheus boot-up hangs here for about 3 minutes

Jul 31 12:48:16 prom2 prometheus: level=warn ts=2018-07-31T11:48:16.118092451Z caller=head.go:320 component=tsdb msg="unknown series references in WAL samples" count=3362600
Jul 31 12:48:16 prom2 prometheus: level=info ts=2018-07-31T11:48:16.261559726Z caller=main.go:543 msg="TSDB started"
Jul 31 12:48:16 prom2 prometheus: level=info ts=2018-07-31T11:48:16.261622981Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
Jul 31 12:48:16 prom2 prometheus: level=info ts=2018-07-31T11:48:16.263167605Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
Jul 31 12:48:16 prom2 prometheus: level=info ts=2018-07-31T11:48:16.26320593Z caller=main.go:502 msg="Server is ready to receive web requests."
Jul 31 12:48:25 prom2 prometheus: level=info ts=2018-07-31T11:48:25.440343232Z caller=compact.go:398 component=tsdb msg="write block" mint=1532995200000 maxt=1533002400000 ulid=01CKR2NT6ZQ74X2Z7C7RT4VF9F
Jul 31 12:48:28 prom2 prometheus: level=info ts=2018-07-31T11:48:28.701125234Z caller=head.go:348 component=tsdb msg="head GC completed" duration=304.670966ms
Jul 31 12:48:28 prom2 prometheus: level=info ts=2018-07-31T11:48:28.701191739Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=2.21µs
Jul 31 12:48:33 prom2 prometheus: level=info ts=2018-07-31T11:48:33.310006881Z caller=compact.go:398 component=tsdb msg="write block" mint=1533002400000 maxt=1533009600000 ulid=01CKR2P3XVF4H0988ZP6XQSFAZ
Jul 31 12:48:36 prom2 prometheus: level=info ts=2018-07-31T11:48:36.272035043Z caller=head.go:348 component=tsdb msg="head GC completed" duration=193.045333ms
Jul 31 12:48:36 prom2 prometheus: level=info ts=2018-07-31T11:48:36.272095864Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=2.077µs
Jul 31 12:48:41 prom2 prometheus: level=info ts=2018-07-31T11:48:41.086276626Z caller=compact.go:398 component=tsdb msg="write block" mint=1533009600000 maxt=1533016800000 ulid=01CKR2PBFQVTEAVTDDE7VW6KX3
Jul 31 12:48:44 prom2 prometheus: level=info ts=2018-07-31T11:48:44.172776416Z caller=head.go:348 component=tsdb msg="head GC completed" duration=237.567629ms
Jul 31 12:48:44 prom2 prometheus: level=info ts=2018-07-31T11:48:44.172849345Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=2.802µs
Jul 31 12:48:53 prom2 prometheus: level=info ts=2018-07-31T11:48:53.837190327Z caller=compact.go:398 component=tsdb msg="write block" mint=1533016800000 maxt=1533024000000 ulid=01CKR2PK58XZ566RWEZW4GB44Y
Jul 31 12:48:57 prom2 prometheus: level=info ts=2018-07-31T11:48:57.040305567Z caller=head.go:348 component=tsdb msg="head GC completed" duration=204.302134ms
Jul 31 12:48:59 prom2 prometheus: level=info ts=2018-07-31T11:48:59.079634175Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=2.039267668s
Jul 31 12:49:09 prom2 prometheus: level=info ts=2018-07-31T11:49:09.854691924Z caller=compact.go:398 component=tsdb msg="write block" mint=1533024000000 maxt=1533031200000 ulid=01CKR2Q1CN4QXSEYD93E5H8Z8S
Jul 31 12:49:12 prom2 prometheus: level=error ts=2018-07-31T11:49:12.565271873Z caller=db.go:272 component=tsdb msg="compaction failed" err="reload blocks: open block /var/prometheus/data/01CKR2Q1CN4QXSEYD93E5H8Z8S: mmap files: mmap: cannot allocate memory"
Jul 31 12:49:24 prom2 prometheus: level=info ts=2018-07-31T11:49:24.139786604Z caller=compact.go:398 component=tsdb msg="write block" mint=1533024000000 maxt=1533031200000 ulid=01CKR2QD6XDZHSS5MN5NSHAQPN
Jul 31 12:49:26 prom2 prometheus: level=error ts=2018-07-31T11:49:26.860043943Z caller=db.go:272 component=tsdb msg="compaction failed" err="reload blocks: open block /var/prometheus/data/01CKR2Q1CN4QXSEYD93E5H8Z8S: mmap files: mmap: cannot allocate memory"
Jul 31 12:49:38 prom2 prometheus: level=info ts=2018-07-31T11:49:38.3662846Z caller=compact.go:398 component=tsdb msg="write block" mint=1533024000000 maxt=1533031200000 ulid=01CKR2QW4WK4ZA5R01XCCR0T3H
Jul 31 12:49:41 prom2 prometheus: level=error ts=2018-07-31T11:49:41.16735163Z caller=db.go:272 component=tsdb msg="compaction failed" err="reload blocks: open block /var/prometheus/data/01CKR2Q1CN4QXSEYD93E5H8Z8S: mmap files: mmap: cannot allocate memory"

At the point of mmap complaining about memory, I could see in a separate terminal there was 3.5G of memory still available. The ulimit setting is unlimited, and the vm.max_map_count has been raised.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 31, 2018

@iDemonix as a double-check can you verify the actual limits of the process by inspecting /proc/<PID>/limits?

IIUC while persisting the head to an immutable block, tsdb fails to reload the blocks (because of the mmap error). A few seconds later, the same process happens: tsdb persists the head to a block, the reload fails and now we have another almost identical block on disk. This doesn't explain why the mmap reading fails but at least why you have so many items in your data directory.

if _, err = db.compactor.Write(db.dir, head, mint, maxt, nil); err != nil {
return changes, errors.Wrap(err, "persist head block")
}
changes = true
runtime.GC()
if err := db.reload(); err != nil {
return changes, errors.Wrap(err, "reload blocks")
}

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Jul 31, 2018

Hi Simon,

Here's the output:

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             31198                31198                processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       31198                31198                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us    
@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Aug 1, 2018

Crashed again overnight and won't start backup, 10G of memory free, and barely any metrics, but it won't start:

Aug  1 08:24:12 prom2 prometheus: level=error ts=2018-08-01T07:24:12.89097635Z caller=main.go:596 err="Opening storage failed open block /var/prometheus/data/01CKT4ZTC5PGDK8T4ZPYEH8REV: mmap files: mmap: cannot allocate memory"
Aug  1 08:24:12 prom2 systemd: prometheus.service: main process exited, code=exited, status=1/FAILURE
Aug  1 08:24:12 prom2 systemd: Unit prometheus.service entered failed state.
Aug  1 08:24:12 prom2 systemd: prometheus.service failed.
^C
[root@prom2 ~]$cd /var/prometheus/data/
[root@prom2 data]$
[root@prom2 data]$du -sh *
227M	01CKS27X7M5BA8ZXTCWDB3RN5E
232M	01CKS93MFPQJTPRG99B5R1FRZF
386M	01CKS9462B8067ZMJSBQPNX1G2
232M	01CKSFZBQ1CC3858RA4D8BV29C
231M	01CKSPV2ZHBR5FHB9FAJW7B5P6
236M	01CKSXPT8CZDM51JT4FT94M3X7
236M	01CKT4JHFKYWDJ61WEGQWTPHYH
236M	01CKT4MT3S4DW8BJP5HYH656WY
236M	01CKT4PZRFH9QGAQ6MAR1G1WBM
236M	01CKT4S6RMYQ902363QR6H1CPH
236M	01CKT4VDGTPKQYM68DS05Y9CN3
236M	01CKT4XMDKEX1GKYDZW24KWXAH
236M	01CKT4ZTC5PGDK8T4ZPYEH8REV
236M	01CKT51ZK0014T1NJG910B21M9
236M	01CKT545KG0B1WZK39X2CQ4HBV
236M	01CKT56CXWWN4CD6NTJDW1EFDE
236M	01CKT58K3CGTKEF5W0X872HBXV
236M	01CKT5ASA8H0CRSYZD3VVBE6RN
236M	01CKT5CYJKMEY2BQA0B4Z7EH25
236M	01CKT5F45RX0B94MN81AXADV9T
236M	01CKT5HCCT2WXQD0R7K2DC8YD0
236M	01CKT5KGTY8BWYQ2HNDV0F6WWY
236M	01CKT5NPXGAGY836X92R4JT1KS
236M	01CKT5QZG08YEHR5C2GK3GM3YT
236M	01CKT5T6EXRC2AR9KY08FTQVAE
0	lock
3.3G	wal
[root@prom2 data]$ls -lah
total 112K
drwxr-xr-x. 28 root root 4.0K Aug  1 08:21 .
drwxr-xr-x.  4 root root 4.0K Jul 31 15:17 ..
drwxr-xr-x.  3 root root 4.0K Jul 31 22:00 01CKS27X7M5BA8ZXTCWDB3RN5E
drwxr-xr-x.  3 root root 4.0K Aug  1 00:00 01CKS93MFPQJTPRG99B5R1FRZF
drwxr-xr-x.  3 root root 4.0K Aug  1 00:00 01CKS9462B8067ZMJSBQPNX1G2
drwxr-xr-x.  3 root root 4.0K Aug  1 02:00 01CKSFZBQ1CC3858RA4D8BV29C
drwxr-xr-x.  3 root root 4.0K Aug  1 04:00 01CKSPV2ZHBR5FHB9FAJW7B5P6
drwxr-xr-x.  3 root root 4.0K Aug  1 06:00 01CKSXPT8CZDM51JT4FT94M3X7
drwxr-xr-x.  3 root root 4.0K Aug  1 08:00 01CKT4JHFKYWDJ61WEGQWTPHYH
drwxr-xr-x.  3 root root 4.0K Aug  1 08:01 01CKT4MT3S4DW8BJP5HYH656WY
drwxr-xr-x.  3 root root 4.0K Aug  1 08:02 01CKT4PZRFH9QGAQ6MAR1G1WBM
drwxr-xr-x.  3 root root 4.0K Aug  1 08:03 01CKT4S6RMYQ902363QR6H1CPH
drwxr-xr-x.  3 root root 4.0K Aug  1 08:05 01CKT4VDGTPKQYM68DS05Y9CN3
drwxr-xr-x.  3 root root 4.0K Aug  1 08:06 01CKT4XMDKEX1GKYDZW24KWXAH
drwxr-xr-x.  3 root root 4.0K Aug  1 08:07 01CKT4ZTC5PGDK8T4ZPYEH8REV
drwxr-xr-x.  3 root root 4.0K Aug  1 08:08 01CKT51ZK0014T1NJG910B21M9
drwxr-xr-x.  3 root root 4.0K Aug  1 08:09 01CKT545KG0B1WZK39X2CQ4HBV
drwxr-xr-x.  3 root root 4.0K Aug  1 08:11 01CKT56CXWWN4CD6NTJDW1EFDE
drwxr-xr-x.  3 root root 4.0K Aug  1 08:12 01CKT58K3CGTKEF5W0X872HBXV
drwxr-xr-x.  3 root root 4.0K Aug  1 08:13 01CKT5ASA8H0CRSYZD3VVBE6RN
drwxr-xr-x.  3 root root 4.0K Aug  1 08:14 01CKT5CYJKMEY2BQA0B4Z7EH25
drwxr-xr-x.  3 root root 4.0K Aug  1 08:15 01CKT5F45RX0B94MN81AXADV9T
drwxr-xr-x.  3 root root 4.0K Aug  1 08:16 01CKT5HCCT2WXQD0R7K2DC8YD0
drwxr-xr-x.  3 root root 4.0K Aug  1 08:18 01CKT5KGTY8BWYQ2HNDV0F6WWY
drwxr-xr-x.  3 root root 4.0K Aug  1 08:19 01CKT5NPXGAGY836X92R4JT1KS
drwxr-xr-x.  3 root root 4.0K Aug  1 08:20 01CKT5QZG08YEHR5C2GK3GM3YT
drwxr-xr-x.  3 root root 4.0K Aug  1 08:21 01CKT5T6EXRC2AR9KY08FTQVAE
-rw-r--r--.  1 root root    0 Jul 31 15:17 lock
drwxr-xr-x.  2 root root 4.0K Aug  1 08:14 wal

What I'm also a bit confused by, the dupe directories it creates seem to be the same size from the faulty one onwards. Could it be this repeated writing of large files/dirs that makes it hit some sort of limit? I've now ran out of things to try (error on both 2.3.2 and 2.3.1) apart from starting to drop Kernel versions until it stops dying.

echo 1 > /proc/sys/vm/overcommit_memory
echo 200 > /proc/sys/vm/nr_hugepages
[root@prom1 data]$cat /proc/sys/vm/max_map_count 
65530
[root@prom1 data]$echo 1966080 > /proc/sys/vm/max_map_count 

I also tried the above commands on both boxes, but no joy, even after persisting them and rebooting:

Aug  1 09:01:40 prom1 prometheus: level=error ts=2018-08-01T08:01:40.003962814Z caller=db.go:277 component=tsdb msg="compaction failed" err="compact [/var/prometheus/data/01CKS27X6TE975NSH9AK2CR3A4 /var/prometheus/data/01CKS93METCA8F34HDEY1WW8D6 /var/prometheus/data/01CKSFZBPVN9JFSFHZ2K5P0K09]: mmap files: mmap: cannot allocate memory"

Aug  1 09:02:42 prom1 prometheus: level=error ts=2018-08-01T08:02:42.017785895Z caller=db.go:277 component=tsdb msg="compaction failed" err="compact [/var/prometheus/data/01CKS27X6TE975NSH9AK2CR3A4 /var/prometheus/data/01CKS93METCA8F34HDEY1WW8D6 /var/prometheus/data/01CKSFZBPVN9JFSFHZ2K5P0K09]: mmap files: mmap: cannot allocate memory"

Does anyone have any other settings I could change? It looks like the latest Prometheus no longer runs on the latest CentOS7 out of the box, even with plenty of RAM?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Aug 1, 2018

@iDemonix as I wrote in my earlier comment, the truncation process is creating the same block directory over and over because the reloading fails. This is a bug that needs to be fixed but in the mean time and to get back to a normal state, you can move the duplicate directories as well as the wal directory somewhere else since they just amplify the mmap failure. It doesn't mean that the problem won't occur again but it might give you some room. As you said, maybe a kernel update triggers the problem. I'm trying to reproduce it on my end.

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Aug 1, 2018

Thanks @simonpasquier, I've bumped the kernel up to a 4.x release to try that and got the same thing after a couple of hours. I've since gone back to my original Kernel version, and I'm running it in Docker as I need to start collecting some usable data - will see if the Docker variant runs ok.

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Aug 2, 2018

On my two equal boxes, I ran 2.3.0 overnight on one, and 2.3.2 in a Docker container on the other. The 2.3.0 one is throwing mmap errors, but the Docker instance with a data volume doesn't appear to have any problems - it's not really a fix but this is likely the route I'll now go down as it seems to be more stable.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Aug 3, 2018

I fail to reproduce the issue: when setting a low virtual memory limit (ulimit -v xxx), the program usually panics on a malloc().

Can you share the following kernel parameters: kernel.shmmni, kernel.shmall and kernel.shmmax? The output of /proc/meminfo might be useful too.

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Aug 3, 2018

Hi @simonpasquier, requested output below.

[root@prom1 ~]$cat /proc/sys/kernel/shmmax
18446744073692774399
[root@prom1 ~]$cat /proc/sys/kernel/shmall
18446744073692774399
[root@prom1 ~]$cat /proc/sys/kernel/shmmni
4096
[root@prom1 ~]$cat /proc/meminfo
MemTotal:       12139344 kB
MemFree:        11083156 kB
MemAvailable:   11513360 kB
Buffers:           45784 kB
Cached:           559804 kB
SwapCached:            0 kB
Active:           395628 kB
Inactive:         370888 kB
Active(anon):     161392 kB
Inactive(anon):     8272 kB
Active(file):     234236 kB
Inactive(file):   362616 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        160916 kB
Mapped:            53316 kB
Shmem:              8748 kB
Slab:             174648 kB
SReclaimable:     142108 kB
SUnreclaim:        32540 kB
KernelStack:        2896 kB
PageTables:         6212 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8166820 kB
Committed_AS:     685136 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      172580 kB
VmallocChunk:   34359341052 kB
HardwareCorrupted:     0 kB
AnonHugePages:     32768 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      188352 kB
DirectMap2M:    12394496 kB

My Prom1 box is just sat with a failed Prometheus that won't start anymore due to mmap issues, my Prom2 box I installed Docker on instead and so far, so good (using an external data volume).

I'll rebuild the Prom1 box soon in to a Docker host, if you'd like any more diagnostics over the weekend, let me know! I tried a few more vm.X and similar sysctl settings but I couldn't get it to stop crashing, and as it can take anything from 1 - 12 hours for problems to start appearing I've simply stuck with Docker for now.

Thanks

@veox

This comment has been minimized.

Copy link

veox commented Aug 31, 2018

Doesn't seem to show anything useful... I'm taking a different approach: inspecting the "broken" database on my workstation, which has much more memory.

Currently, it's failing with "too many open files", so I'll "truncate" the database...

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Aug 31, 2018

Currently, it's failing with "too many open files", so I'll "truncate" the database...

See https://www.robustperception.io/dealing-with-too-many-open-files for that.

@veox

This comment has been minimized.

Copy link

veox commented Aug 31, 2018

Yup, I know how to deal with that. :) Just wanted to give an interim result while I'm inspecting this...


Output of ~/go/bin/tsdb ls .: gist.

Looks like just a block time range overlap, which is to be expected?.. So nothing inherently wrong with the database?..

@veox

This comment has been minimized.

Copy link

veox commented Aug 31, 2018

BTW, here's the node-exporter view (of the occurence today, not the earlier one in the gist above) - this time with memory.

I had a thought that this might be happening due to the memory being already filled by disk cache, but that seems false.

(Tried taking a screenshot of all the graphs, but Grafana has them in an iframe, so Firefox can't take it.)

@omegarus

This comment has been minimized.

Copy link

omegarus commented Oct 2, 2018

I don't know if it's relevant but I had similar problem. I change binary to amd64 and problem disappeared.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 2, 2018

@omegarus what do you mean exactly by "change binary to amd64"? What was the architecture before?

@fusionswap

This comment has been minimized.

Copy link

fusionswap commented Oct 11, 2018

I have been running to this issue now. ulimit -v unlimited and increasing vm.max_map_count doesnt help. is there any workaround for this other than deleting data directory.

We are running on RHEL 7.5 (kernel 3.10.0-862.11.6.el7.x86_64). prometheus version is 2.1.0
I also tried running latest prometheus version pointing to the copy of the data directory. It still fails.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Oct 12, 2018

@fusionswap what is the error message you are getting?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 12, 2018

@fusionswap can you upgrade to 2.4.3 and share the configuration + log files? Also /proc/sys/kernel/shm{mni,max,all} and /proc/meminfo might be useful.

@fusionswap

This comment has been minimized.

Copy link

fusionswap commented Oct 12, 2018

@krasi-georgiev its the same error.
Oct 11 11:16:01 prometheus: level=error ts=2018-10-11T18:16:01.794464676Z caller=main.go:579 err=“Opening storage failed open block /prometheus/data/01CS1S8AA6170FV8Z0E516YZEE: mmap: cannot allocate memory”

@fusionswap

This comment has been minimized.

Copy link

fusionswap commented Oct 12, 2018

@simonpasquier

I took backup of data directory and ran the prometheus 2.4.3 pointing it to this backup data directory.

config gist: https://gist.github.com/fusionswap/427b45be82de6a05d76fee8e8dcdc18a
prometheus startup logs gist: https://gist.github.com/fusionswap/1ac0365b7f90d6f89a824d669eeb11a1

I also ran it with strace.
strace gist prometheus 2.1: https://gist.github.com/fusionswap/4150b46af31ccc98725f4ff95b0e409c
strace gist prometheus 2.4.3: https://gist.github.com/fusionswap/35f14a532f3a512899f62f71e77e0a6e

Also, note that I am also seeing lots of directories created at 1 min interval similar to @iDemonix

/proc details are below.

sh-4.2# cat /proc/sys/kernel/shmmni
4096

sh-4.2# cat /proc/sys/kernel/shmmax
18446744073692774399

sh-4.2# cat /proc/sys/kernel/shmall
18446744073692774399

sh-4.2# cat /proc/meminfo
MemTotal: 16266860 kB
MemFree: 10425440 kB
MemAvailable: 14431428 kB
Buffers: 7680 kB
Cached: 3899340 kB
SwapCached: 0 kB
Active: 3026160 kB
Inactive: 2119536 kB
Active(anon): 784412 kB
Inactive(anon): 455100 kB
Active(file): 2241748 kB
Inactive(file): 1664436 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4194300 kB
Dirty: 292 kB
Writeback: 0 kB
AnonPages: 1238684 kB
Mapped: 120376 kB
Shmem: 836 kB
Slab: 486412 kB
SReclaimable: 438084 kB
SUnreclaim: 48328 kB
KernelStack: 10080 kB
PageTables: 16512 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 12327728 kB
Committed_AS: 4850404 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 40864 kB
VmallocChunk: 34359693312 kB
HardwareCorrupted: 0 kB
AnonHugePages: 370688 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 364368 kB
DirectMap2M: 16412672 kB

@fusionswap

This comment has been minimized.

Copy link

fusionswap commented Oct 18, 2018

I ran with 2.4.3 with empty data directory. I am again seeing this issue. This happens after 4 days.

Oct 18 02:00:06 lpqosput50155 prometheus: level=info ts=2018-10-18T09:00:06.958845312Z caller=compact.go:398 component=tsdb msg="write block" mint=1539842400000 maxt=1539849600000 ulid=01CT36EBE1D9ZR2GJ0YATSKRM2
Oct 18 02:00:08 lpqosput50155 prometheus: level=info ts=2018-10-18T09:00:08.159754662Z caller=head.go:446 component=tsdb msg="head GC completed" duration=402.834289ms
Oct 18 02:00:14 lpqosput50155 prometheus: level=info ts=2018-10-18T09:00:14.051172095Z caller=head.go:477 component=tsdb msg="WAL checkpoint complete" low=60 high=61 duration=5.891260795s
Oct 18 02:00:22 lpqosput50155 prometheus: level=info ts=2018-10-18T09:00:22.164041167Z caller=compact.go:352 component=tsdb msg="compact blocks" count=3 mint=1539820800000 maxt=1539842400000 ulid=01CT36ERTB40THQXV3QWX28G5F sources="[01CT2HV5QM37VHWARZ1EGPB257 01CT2RPWZA1G1VWPN5S16S3B5Y 01CT2ZJM75VBMVH0GHGX5MV492]"
Oct 18 02:00:22 lpqosput50155 prometheus: level=error ts=2018-10-18T09:00:22.883725617Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /amex/prometheus/data/01CT36ERTB40THQXV3QWX28G5F: mmap files: mmap: cannot allocate memory"
Oct 18 02:00:23 lpqosput50155 prometheus: level=error ts=2018-10-18T09:00:23.898640901Z caller=db.go:305 component=tsdb msg="compaction failed" err="compact [/amex/prometheus/data/01CT2HV5QM37VHWARZ1EGPB257 /amex/prometheus/data/01CT36ERTB40THQXV3QWX28G5F /amex/prometheus/data/01CT2RPWZA1G1VWPN5S16S3B5Y /amex/prometheus/data/01CT2ZJM75VBMVH0GHGX5MV492]: mmap files: mmap: cannot allocate memory"

After this compaction failed message, the block directories keep getting created at one minute interval

@fusionswap

This comment has been minimized.

Copy link

fusionswap commented Oct 18, 2018

Well, this was stupid mistake on our side. We were running linux-386 (32 bit) binaries on our X86-64 servers. So, it was limited to 4G possible memory utilization.

However, as stated by @simonpasquier it is still a bug where after certain failures in compacting block directories due to memory allocation issues, it starts duplicating directories every minute until it stops running.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Oct 18, 2018

@fusionswap , thanks for the update , I will remember that one :)

@iDemonix, @veox is it possible that your issue is the same? 32 bit binary on a 64 bit machine?

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Oct 18, 2018

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Oct 18, 2018

@krasi-georgiev It is 100% possible but I can't check now as the box was trashed and I went with a Dockerised install instead! I'm not sure why I would have got the wrong arch, but again, could have accidentally pulled the wrong one!

I'll try a rebuild at some point on the same CentOS release on a vanilla install, to see if it happens again.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Oct 18, 2018

since 2 people confirmed, shall we close it and reopen if we get other reports?

@iDemonix

This comment has been minimized.

Copy link
Author

iDemonix commented Oct 18, 2018

I'm happy to close until I get chance to re-test! If anyone confirms that using the correct binary fixes this, they can reopen. Thanks for the help.

@iDemonix iDemonix closed this Oct 18, 2018

@veox

This comment has been minimized.

Copy link

veox commented Oct 19, 2018

@krasi-georgiev No, I'm running a 64-bit 32-bit local build, on a local machine on a 32-bit machine.

It's OK for the issue to be closed as far as I'm concerned, because (apparently) it's expected that Prometheus should be allowed by the system to allocate as much memory as it wants. This is at the core of the issue, and a design choice that I'm not willing to argue.

Especially since I'm not contributing any code that will help. :)


JIC someone else's running a limited-hardware set-up - my workaround:

  • lowering the retention period (via --storage.tsdb.retention=10d) based on the average amount of data that gets collected per day, so that the TSDB doesn't grow beyond 1 GiB (a limit established empirically, for the specific hardware/OS combination used);
  • copying compacted, historic blocks to a different directory (using a cronjob), a bit before they get "cleaned up" by Prometheus itself running on this node.

So far (for ~2 months now) this set-up has "worked": i.e., no more loss of measurements because of disk getting full, because no more thrashing the disk with identical uncompacted blocks, because no more failure to map memory while doing compaction.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Oct 19, 2018

The OS in my case is an Arch Linux Arm 32-bit (architecture ID armv7h).

@veox does it mean you upgraded the OS? One of you comments mentioned 32bit ?

In summary:

  • It is NOT expected to see mmap: cannot allocate memory errors on a 64 bit OS when the system has some free memory
  • it is expected to see increased memory usage during compaction and even more at higher retention period since it will compact larger blocks. So if the system doesn't have enough memory Prometheus will crash.(suggestion how to improve the design are welcome :)
  • it is expected to start Prometheus normally after a compaction crash (I think this was fixed from 2.4 onward, tmp files might be left behind so these need to be deleted manually)

So if you observe anything different than let me know and will reopen to continue to troubleshooting.

@veox

This comment has been minimized.

Copy link

veox commented Oct 20, 2018

does it mean you upgraded the OS?

@krasi-georgiev Sorry; that means I'm talking nonsense, or otherwise out of my mind. X_X

Edited the comment: running a 32-bit local build, on a 32-bit machine (as compared to 32-bit on 64-bit). An embarrassing brain-fart, no less.

suggestion how to improve the design are welcome :)

Thank you for your patience. I'll open a PR if I have something concrete.

@klaper

This comment has been minimized.

Copy link

klaper commented Oct 28, 2018

I'm experiencing same issue here.
My configuration:

  • Raspberry PI 3 B+
  • Raspbian Stretch Lite October 2018
  • prometheus, version 2.4.3+ds (branch: debian/sid, revision: 2.4.3+ds-2) installed from armhf deb package (https://packages.debian.org/sid/net/prometheus) (so I guess no 64bit on 32bit)
  • --storage.tsdb.retention=15y --storage.tsdb.min-block-duration=2h --storage.tsdb.max-block-duration=2h but problem also appears on default settings of min/max block durations.

Provided screens are from standard Grafana dashboard offered to me upon Prometheus DS setup. I got this issue couple times but thats first time I caught it with Prometheus still working (it shuts down after a while but I have no idea why).

From start Prometheus process uses up bit more memory every 2 hours (its shuttered due system shutdown - still virtual memory usage got back to same level):
mem total
Problem starts when it uses up 2Gb of virtual memory as shown on screen:
memory usage
And since 14:00 (last virtual memory usage increase) I get those in logs (no prometheus restart needed):

Oct 28 14:00:02 yetipi prometheus[26089]: level=info ts=2018-10-28T13:00:02.391394751Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXC4Z59TK1K0WBKFYDNKD1F
Oct 28 14:00:02 yetipi prometheus[26089]: level=error ts=2018-10-28T13:00:02.73656226Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:00:06 yetipi prometheus[26089]: level=info ts=2018-10-28T13:00:06.12269913Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXC52S2SY6GXRM5ZQMA5M5D
Oct 28 14:00:06 yetipi prometheus[26089]: level=error ts=2018-10-28T13:00:06.476283284Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:00:10 yetipi prometheus[26089]: level=info ts=2018-10-28T13:00:10.723025209Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXC57CW0H2ZV5GHX56BY7GF
Oct 28 14:00:10 yetipi prometheus[26089]: level=error ts=2018-10-28T13:00:10.981200751Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:00:17 yetipi prometheus[26089]: level=info ts=2018-10-28T13:00:17.394042909Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXC5DR5DES5D979CED7FZ58
Oct 28 14:00:17 yetipi prometheus[26089]: level=error ts=2018-10-28T13:00:17.739458542Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:00:28 yetipi prometheus[26089]: level=info ts=2018-10-28T13:00:28.099928811Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXC5R8BM6MCQGA5BXW9MHN5
Oct 28 14:00:28 yetipi prometheus[26089]: level=error ts=2018-10-28T13:00:28.442243411Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:00:46 yetipi prometheus[26089]: level=info ts=2018-10-28T13:00:46.776944714Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXC6AGTFBB415GA5DC8B5HN
Oct 28 14:00:47 yetipi prometheus[26089]: level=error ts=2018-10-28T13:00:47.046617832Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"

And since then every minute or so:

Oct 28 14:48:37 yetipi prometheus[26089]: level=info ts=2018-10-28T13:48:37.592643993Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXEXX11KD52EACVKWZ6MFW1
Oct 28 14:48:38 yetipi prometheus[26089]: level=error ts=2018-10-28T13:48:38.350459836Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:49:41 yetipi prometheus[26089]: level=info ts=2018-10-28T13:49:41.188675433Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXEZVNG99NWG8R7JH5MGY0R
Oct 28 14:49:41 yetipi prometheus[26089]: level=error ts=2018-10-28T13:49:41.669878998Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:50:44 yetipi prometheus[26089]: level=info ts=2018-10-28T13:50:44.467262825Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXF1SG8ZHN6GHX955TDXQ2Z
Oct 28 14:50:45 yetipi prometheus[26089]: level=error ts=2018-10-28T13:50:45.118138542Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:51:47 yetipi prometheus[26089]: level=info ts=2018-10-28T13:51:47.75997761Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXF3QEZBF1J5NKSK8EMKF72
Oct 28 14:51:48 yetipi prometheus[26089]: level=error ts=2018-10-28T13:51:48.104461221Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:52:50 yetipi prometheus[26089]: level=info ts=2018-10-28T13:52:50.723599372Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXF5MZ860NVDAHRX0X1JA7V
Oct 28 14:52:51 yetipi prometheus[26089]: level=error ts=2018-10-28T13:52:51.093966007Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"
Oct 28 14:53:55 yetipi prometheus[26089]: level=info ts=2018-10-28T13:53:55.364512794Z caller=compact.go:398 component=tsdb msg="write block" mint=1540720800000 maxt=1540728000000 ulid=01CTXF7JFZ38C55CE4HENNZVE3
Oct 28 14:53:55 yetipi prometheus[26089]: level=error ts=2018-10-28T13:53:55.995176509Z caller=db.go:305 component=tsdb msg="compaction failed" err="reload blocks: open block /mnt/storage/prometheus/metrics2/01CTXC4Z59TK1K0WBKFYDNKD1F: mmap files: mmap: cannot allocate memory"

Also then compaction started appearing:
comp
And I have new db files in every 1 minute:

drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 12:00 01CTX597WZ12PYZNWFNNBS2A47
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:00 01CTXC4Z59TK1K0WBKFYDNKD1F
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:00 01CTXC52S2SY6GXRM5ZQMA5M5D
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:00 01CTXC57CW0H2ZV5GHX56BY7GF
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:00 01CTXC5DR5DES5D979CED7FZ58
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:00 01CTXC5R8BM6MCQGA5BXW9MHN5
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:00 01CTXC6AGTFBB415GA5DC8B5HN
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:01 01CTXC7CA7JN7BT9W38MTJ5052
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:02 01CTXC99GVJTJZVQ1SFSAW4B45
drwxr-xr-x  3 prometheus prometheus  4096 Oct 28 14:03 01CTXCB6YW7VQNW77DJ4E19XV2

Some revenant data:

root@yetipi:/proc/26089# cat limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             7291                 7291                 processes
Max open files            1024                 4096                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       7291                 7291                 signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
root@yetipi:/mnt/storage/prometheus/metrics2# cat /proc/sys/kernel/shmmax
4278190079
root@yetipi:/mnt/storage/prometheus/metrics2# cat /proc/sys/kernel/shmall
4278190079
root@yetipi:/mnt/storage/prometheus/metrics2# cat /proc/sys/kernel/shmmni
4096
root@yetipi:/mnt/storage/prometheus/metrics2# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0  13312  77996  26664 538396    0    0    17    14   17   34 10  2 88  0  0

And strace of what happens when Prometheus does write to logs those errors.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Oct 28, 2018

I am pretty sure that the problem is when the compaction expand the block and probably can be fixed with some workaround but i personally don't think it is worth the trouble supporting 32 bit systems.

Would be interested to see what @fabxc and @gouthamve have to say about this.

@klaper

This comment has been minimized.

Copy link

klaper commented Oct 28, 2018

Stacktrace of dying prometheus. I guess it might be helpfull, in case you decided it's worth fixing for 32bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.