Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up2.3.0 - WAL log samples: log series: /wal/007913: file already closed #4303
Comments
This comment has been minimized.
This comment has been minimized.
|
Can you show full logs as these are cut. I have seen this error once caused by low disk space so worth checking. |
This comment has been minimized.
This comment has been minimized.
|
WAL log samples: log series: write /data/prometheus/metrics2/wal/007938: file already closed Just fell over again. dev/xvdf 493G 293G 179G 63% /data Plenty of storage available. Jun 22 15:55:52 ukasprom01 prometheus[1845]: level=warn ts=2018-06-22T15:55:52.98659825Z caller=scrape.go:717 component="scrape manager" scrape_pool="RC - App" target=http://XXXXXX:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007938: file already closed" Jun 22 15:56:03 ukasprom01 prometheus[1845]: level=error ts=2018-06-22T15:56:03.661810827Z caller=wal.go:713 component=tsdb msg="sync failed" err="flush buffer: write /data/prometheus/metrics2/wal/007938: file already closed" ARGS="--storage.tsdb.retention=365d --storage.tsdb.max-block-duration=1d --storage.tsdb.path="/data/prometheus/ |
This comment has been minimized.
This comment has been minimized.
|
You're running out of file descriptors, make sure you have a high ulimit. |
This comment has been minimized.
This comment has been minimized.
|
How do I set this? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
cheers, tweaked this - will monitor over weekend. |
This comment has been minimized.
This comment has been minimized.
|
sure thing. Closing now , but feel free to reopen if it happens again. Thanks! |
krasi-georgiev
closed this
Jun 22, 2018
This comment has been minimized.
This comment has been minimized.
|
Unfortunately same thing happened again this morning, same symptoms.... |
This comment has been minimized.
This comment has been minimized.
|
epair.go:39 component=tsdb msg="found healthy block" mint=1529892000000 maxt=1529899200000 ulid=01CGTN085E67CH51V4YYNMS6MV |
This comment has been minimized.
This comment has been minimized.
|
onaws.com on 172.18.91.149:53: dial udp 172.18.91.149:53: socket: too many open files" |
This comment has been minimized.
This comment has been minimized.
|
@VR6Pete please paste the first lines of the log file to check the effective fd limits as it's sometimes tricky to set them properly. They should be reported like this:
|
This comment has been minimized.
This comment has been minimized.
|
ubuntu@ukasprom01:/etc/prometheus$ journalctl -u prometheus.service | grep soft yes, doesn't look like its taken effect. /etc/security/limits.conf
I thought that Prometheus would pick this change up. |
This comment has been minimized.
This comment has been minimized.
|
It depends how your Prometheus service is managed. Do you use systemd? |
This comment has been minimized.
This comment has been minimized.
|
yes - ubuntu system. |
This comment has been minimized.
This comment has been minimized.
|
You need to set https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Process%20Properties |
This comment has been minimized.
This comment has been minimized.
|
Ok, that has now read in. prometheus[1269]: level=info ts=2018-06-25T10:55:42.910572336Z caller=main.go:225 fd_limits="(soft=65536, hard=65536)" I have noticed that I have the following message on starting Prometheus - is there any action to be taken? ler=head.go:320 component=tsdb msg="unknown series references in WAL samples" count=55361 |
This comment has been minimized.
This comment has been minimized.
|
Regarding the "unknown series references in WAL samples" message I would assume that it would go away after the first compaction and a clean stop/start of the service. @krasi-georgiev? |
This comment has been minimized.
This comment has been minimized.
|
Have'n't looked at that part of the code yet, but I assume it should clear those when passed the time range. |
This comment has been minimized.
This comment has been minimized.
linkbug
commented
Feb 21, 2019
This comment has been minimized.
This comment has been minimized.
|
@linkbug lots of changes have been done since 2.3, would you mind opening a new issues with steps to reproduce. |

VR6Pete commentedJun 22, 2018
Upgraded to 2.3.0 yesterday and now getting these errors in the "targets" screen and journalctl.
If it makes any difference I configured blackbox at the same time to scrape around 200 URL's.
WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already closed
om:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already clo
.com:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already clo
oud.com:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already
oud.com:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already
.com:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already clo
.com:9182/metrics msg="append failed" err="WAL log samples: log series: write /data/prometheus/metrics2/wal/007913: file already clo
be?module=http_2xx&target=https%3A%2F%2FXXXX.XXXX-XXX.com%2XXXXCareMobile%XXXXX" msg="append failed" err="WAL log samp
Performing a restart works but then quickly falls over again with the same error.