Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus 2.0.0 - Panic on reload due to fd limit. #3446
Comments
grobie
added
component/service discovery
kind/bug
priority/P2
labels
Nov 9, 2017
This comment has been minimized.
This comment has been minimized.
|
I think I understand the cause of the problem after looking at the code: However I'm unable to trigger the bug to verify my hypothesis. Do you have some guidelines to reproduce it? |
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier maybe try using |
This comment has been minimized.
This comment has been minimized.
|
You can adjust the ulimit of the running Prometheus with Try this:
|
andreasnuesslein
referenced this issue
Feb 21, 2018
Closed
Too many open sockets and fds on reload #3873
This comment has been minimized.
This comment has been minimized.
|
Are you still seeing this? |
This comment has been minimized.
This comment has been minimized.
theonlydoo
commented
May 29, 2018
•
|
I do with [Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
MemoryAccounting=true
MemoryLimit=24G
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
User=prometheus
Restart=always
RestartSec=2
EnvironmentFile=/etc/default/prometheus
ExecStart=/usr/bin/prometheus $ARGS
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill $MAINPID
TimeoutStopSec=300
[Install]
WantedBy=multi-user.target
this systemd.unit service file May I also add that some targets are not scraped as a consequence.
A diagnostic element: |
This comment has been minimized.
This comment has been minimized.
|
@theonlydoo can you share the logs when it crashes? |
This comment has been minimized.
This comment has been minimized.
theonlydoo
commented
Aug 1, 2018
•
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766264436Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766259599Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766266961Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766303485Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766319538Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766332945Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766355581Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"
juil. 31 14:59:33 prometheus prometheus[29940]: level=error ts=2018-07-31T12:59:33.766350559Z caller=file.go:223 component="discovery manager scrape" discovery=file msg="Error adding file watcher" err="too many open files"The config pattern is:
my guess is that the daemon doesn't releases the deleted fd. prometheus, version 2.3.1 (branch: HEAD, revision: 188ca45bd85ce843071e768d855722a9d9dabe03)
build user: root@82ef94f1b8f7
build date: 20180619-15:56:22
go version: go1.10.3 |
This comment has been minimized.
This comment has been minimized.
|
@theonlydoo to be clear Prometheus is hitting the fd limits but it doesn't panic. Can you share the start of the logs too? There should be a line looking like |
This comment has been minimized.
This comment has been minimized.
theonlydoo
commented
Aug 1, 2018
|
yep, it doesn't crash, it just doesn't scrape my targets anymore, which is kinda worse because it's hard to see without having a dashboard in front of me :-)
my systemd file sets those limits:
|
This comment has been minimized.
This comment has been minimized.
|
That's running into the inotify limit. |
This comment has been minimized.
This comment has been minimized.
|
@theonlydoo specifically you need to increase Closing this issue since the panic doesn't seem to happen anymore. |
simonpasquier
closed this
Sep 12, 2018
This comment has been minimized.
This comment has been minimized.
theonlydoo
commented
Sep 12, 2018
|
done, I'll keep u posted on this issue if it ever comes back again (testing on 2 different versions) |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
SuperQ commentedNov 9, 2017
What did you do?
SIGHUP Prometheus
What did you expect to see?
Safe reload
What did you see instead? Under which circumstances?
With a file ulimit 4096 max, I got a SIGSEGV. See trace below.
Environment
System information:
Linux 4.4.0-66-generic x86_64Prometheus version: