Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus can leak goroutines when scraping targets over unreliable connection #5394
Comments
This comment has been minimized.
This comment has been minimized.
|
Great, we've been trying to track this down for a long time! |
BenoitKnecht
added a commit
to BenoitKnecht/common
that referenced
this issue
Mar 21, 2019
BenoitKnecht
referenced this issue
Mar 21, 2019
Merged
config: set TLSHandshakeTimeout in HTTP transport #179
BenoitKnecht
added a commit
to BenoitKnecht/common
that referenced
this issue
Mar 21, 2019
brian-brazil
added a commit
to prometheus/common
that referenced
this issue
Mar 21, 2019
This comment has been minimized.
This comment has been minimized.
|
seems fixed in prometheus/common#179 ? |
This comment has been minimized.
This comment has been minimized.
|
This needs a new tag for prometheus/common and an update of the dependency in this repo. |
This comment has been minimized.
This comment has been minimized.
|
I'll be doing that tomorrow.
…On Tue 9 Apr 2019, 16:32 Simon Pasquier, ***@***.***> wrote:
This needs a new tag for prometheus/common and an update of the dependency
in this repo.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5394 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdoQWv-d7rQassKiQm6gmer963Ij8ks5vfLKbgaJpZM4cBTQ_>
.
|
This comment has been minimized.
This comment has been minimized.
|
As I mentioned, the Should I open an issue for that in |
This comment has been minimized.
This comment has been minimized.
|
Blackbox exporter is more complicated to do, as we can't use the current version of common. |
brian-brazil
closed this
in
#5344
Apr 10, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
BenoitKnecht commentedMar 21, 2019
Bug Report
I noticed that my Prometheus server is leaking goroutines, and that it correlates to established TCP connections:
The TCP connections in question all go to the same host, which is on the other end of a pretty unreliable VPN tunnel.
This is the top of the goroutine profile for this Prometheus server:
The relevant part of
net/http.(*persistConn).addTLS()isAs you can see, it can hang forever if
TLSHandshakeTimeoutis set to zero. And in Prometheus, the HTTP client's transport comes fromgithub.com/prometheus/common/config. NewRoundTripperFromConfig(), whereTLSHandshakeTimeoutisn't explicitly set (so it defaults to zero):I tried setting
TLSHandshakeTimeoutto 10 seconds, and ran the patched and unpatched versions of Prometheus with the same configuration.Here are the goroutine and TCP sessions graphs for the patched
and unpatched
versions. And the top of the goroutine profile for the patched version:
I'm going to submit a PR for
github.com/prometheus/commonto setTLSHandshakeTimeoutto the same value asnet/http.DefaultTransport.Other Prometheus components may be affected. I've seen the same issue in the
blackbox_exporterat least.Environment