Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upConsul SD Config connection leak #4425
Comments
This comment has been minimized.
This comment has been minimized.
|
Do the net_conntrack_dialer metrics also indicate it's consul? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil
|
This comment has been minimized.
This comment has been minimized.
|
FYI: I won't have a stable internet connection in the next two weeks but I'll look at this issue as soon as I'm back. Looking at your configuration, you use metric relabeling to filter services. If you have a lot of services this is likely to watch all of them, then drop the ones that don't have the tag. Check https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cconsul_sd_config%3E, you can use the |
This comment has been minimized.
This comment has been minimized.
|
I guess I came up to this problem a little closer.
While on the other node I have this value below 900. The configurations are the same. upd:
That thing surprised me as I have only 263 node targets. upd2:
|
This comment has been minimized.
This comment has been minimized.
surmehta
commented
Jul 30, 2018
|
We are using prometheus version 2.3.1 and facing same issue. We also changed from metric relabeling option to use the tag for filtering services and still seeing connection leaks.
Here is the graph with net_conntrack_dialer_conn_established_total metrics There seems to be a correlation between connection leaks and configuration reloads http://localhost:9090/-/reload. With each reload there is an increase in connection count and the total connections count never goes down. |
This comment has been minimized.
This comment has been minimized.
|
net_conntrack_dialer_conn_established_total is a counter so will always go up, if it's increasing faster than net_conntrack_dialer_conn_closed_total you have a problem. |
This comment has been minimized.
This comment has been minimized.
surmehta
commented
Jul 30, 2018
This comment has been minimized.
This comment has been minimized.
|
That sounds like a leak then, but not in Consul. This is something we encountered before but never fully resolved. Are you using HTTPS? |
This comment has been minimized.
This comment has been minimized.
surmehta
commented
Jul 30, 2018
|
No. we are using http |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil @surmehta Another notice I have is that our second node which we actually have problems with is flipping with active/failed statuses in Consul Serf Monitor. So actually it does at least constant server reload as consul-agent sees that node is back active. I'm not sure how it can affect connection pool utilization. Maybe we can have a solution for this soon. |
This comment has been minimized.
This comment has been minimized.
surmehta
commented
Jul 31, 2018
|
@ashepelev We are using client-go library to watch for changes in specific kubernetes resources and call reload API |
This comment has been minimized.
This comment has been minimized.
|
I confirm that Consul SD is leaking connections on reloads. Working on a fix. |
simonpasquier
referenced this issue
Aug 1, 2018
Merged
discovery/consul: close idle connections on stop #4443
simonpasquier
added
kind/bug
and removed
kind/more-info-needed
labels
Aug 1, 2018
simonpasquier
closed this
in
#4443
Aug 10, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |





ashepelev commentedJul 26, 2018
•
edited
Proposal
Use case. Why is this important?
Consul SD Target Discovery
Bug Report
What did you do?
Using consul_sd_config
What did you expect to see?
Discovering targets with stable work without restarts
What did you see instead? Under which circumstances?
Host is running out of FDs. The FDs are used for established connections with consul
Environment
Ubuntu 16.04
System information:
Linux 4.4.0-83-generic x86_64
Prometheus version:
prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e2)
build user: root@5258e0bd9cc1
build date: 20180712-14:02:52
go version: go1.10.3
Alertmanager version:
alertmanager, version 0.15.1 (branch: HEAD, revision: 8397de1830f154535a31150f9262da0072d8725d)
build user: root@efde7f9485ae
build date: 20180712-18:25:27
go version: go1.10.3
Prometheus configuration file:
consul_sd_configs:
token: token-token-token
datacenter: dc
relabel_configs:
regex: ^.prometheus_exporter.$
action: keep
Alertmanager configuration file:
Logs:
Logs don't provide relevant information.
FD Usage:
lsof -u prometheus | grep consul | wc -l4518process_open_fds on the host:

On the July 12 we've upgraded from 2.0.0 to 2.3.2 release
Dropdowns are service restarts.
There was already closed issue: https://github.com/prometheus/prometheus/issues/3096