Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus fails to write WAL due to the `cannot allocate memory` error #4014
Comments
This comment has been minimized.
This comment has been minimized.
YanjieGao
commented
Mar 29, 2018
|
meet the same error. Could we set prometheus param to tuning the memory? |
This comment has been minimized.
This comment has been minimized.
|
There are no memory params in Prometheus 2.0 so no tuning.. It happened again two times. I tried flushing all data but with no result. It happened without restart of Prometheus again. |
This comment has been minimized.
This comment has been minimized.
|
Your machine is tight on memory, reduce usage or add more. |
This comment has been minimized.
This comment has been minimized.
|
There is 10G of free memory on the machine and average usage of the Prometheus instance is 2G of memory. It has 400k time series. This is really not sufficient? In documentation there is no section about how to determine required memory for the server anymore. It's been running on this machine for months and there were no problems with memory but this started since the 2.2.1 upgrade. |
This comment has been minimized.
This comment has been minimized.
|
Prometheus uses a lot of virtual memory, it's possible something is up with your kernel that it isn't liking that. |
This comment has been minimized.
This comment has been minimized.
|
Thank's for the response.. unfortunately I still cannot find the "something". I have two instances (both suffering from this issue) running in different DC which are identical so I deployed Were there any changes that could cause some burst in memory usage on compaction? I'm confused by the fact it started after the |
This comment has been minimized.
This comment has been minimized.
|
I see no changes that make a difference on Linux. This looks like your kernel refusing to allocate virtual memory, check your overcommitment settings. |
This comment has been minimized.
This comment has been minimized.
|
So I think we can close this up. I found out it was my dumb mistake on misreading data from cAdvisors (mixed data from multiple namespaces). Sorry for bothering you with this. After correcting resources all seems to be fine again. Thanks! |
FUSAKLA
closed this
Apr 6, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
FUSAKLA commentedMar 26, 2018
•
edited
What did you do?
Restarted Prometheus
What did you expect to see?
Prometheus to come up as usual
What did you see instead? Under which circumstances?
Prometheus crashed when compacting with
cannot allocate memoryand afterwards not ingesting any data because thewrite /prometheus/wal/000943: cannot allocate memoryAfter restart it's ok again.. I'm confused it has available resources, there is no OOM from Kubernetes.
Nothing indicates there is any problem with memory.
Worst thing is that Prometheus is still running and no alerts ale dispatched (just those using
absent) because Prometheus has no data at all.It happened already twice.
Unfortunately I'm not able to reproduce it.
Environment
Running in Kubernetes with data mounted as
hostPathResources info from
kubectl describe node:System information: official docker image
Prometheus version: 2.2.1
Logs:
Maybe it's not Prometheus fault bud if so, could you suggest where to look?
I'm out of ideas what could have caused this.