Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upConsul connection leak #3096
Comments
This comment has been minimized.
This comment has been minimized.
|
Can you confirm that |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil: Yes. Here's the same graph with |
This comment has been minimized.
This comment has been minimized.
|
process_open_fds is a gauge, can you graph it without the rate? |
brian-brazil
added
kind/bug
and removed
kind/more-info-needed
labels
Aug 21, 2017
This comment has been minimized.
This comment has been minimized.
brian-brazil
added
the
priority/P2
label
Aug 21, 2017
This comment has been minimized.
This comment has been minimized.
|
I've been able to reproduce this with a pretty basic setup; prometheus and consul running via docker compose, consul doesn't have any services in it's sd but prometheus is configured to scrape consul and itself. I then ran Looking into the cause. |
This comment has been minimized.
This comment has been minimized.
|
@fabxc I think you were right, the
and the previous curl command So unlike before where the number of open fds continuously increased from that graph it looks like the number of open fds stabilized at some point where the amount opened via reloads and the amount closed periodically via the I'm hoping someone might have insight as to why without the timeout I was seeing the count for goroutines running |
This comment has been minimized.
This comment has been minimized.
|
Note that disabling keep alives for the same |
cstyan
referenced this issue
Oct 16, 2017
Merged
use a timeout in the HTTP client used for consul sd #3303
This comment has been minimized.
This comment has been minimized.
|
this can probably be marked as fixed |
brian-brazil
closed this
Oct 30, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |



christoe commentedAug 21, 2017
What did you do?
We're running Prometheus 1.6.3 with Consul SD config.
What did you expect to see?
Prometheus running forever
What did you see instead? Under which circumstances?
We're quite rapidly running out of fd:s. The leak comes in the form of connections to Consul (running locally, so connections to localhost:8500). Currently we need to restart due to fd exhaustion several times per week. We first saw it on 1.3, upgraded to 1.4 and 1.5.2 and still had the issue, and now we're running 1.6.3.
Here's

node_filefd_allocatedon the machine in question, rate(1h):On the graph above its easy to see that the fd allocation rate goes up, until it reaches the limit and cause Prometheus to stop scraping (the blanks in between the peaks). Older graphs with Prometheus/node fd correlation here: #1873 (comment)
We see some correlation between configuration reloads and the rate of allocation, frequent reloads seem to make the problem worse. We use consul-template to generate the Prometheus configuration file, and have had issues with flapping services causing forced reloads. We've implemented some better handling of flapping, but just the rate of new services appearing in the environment causes a couple of reloads per day, so we'll never get rid of the reloads entirely.
Environment
System information:
Linux 3.10.0-327.28.3.el7.x86_64 x86_64
Prometheus version:
prometheus, version 1.6.3 (branch: master, revision: c580b60)
build user: root@a6410e65f5c7
build date: 20170522-09:15:06
go version: go1.8.1