-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trusted Cluster goroutine leak #10648
Comments
Did a bit more digging on this and it ended up being caused by two things Blocking indefinitely writing to
|
c.CertAuthorityC <- casToSlice(c.host, c.user) |
teleport/lib/services/watcher.go
Line 1011 in 5023235
c.CertAuthorityC <- casToSlice(c.host, c.user) |
The correct behavior would be to select on writing to c.CertAuthorityC
and reading from ctx.Done
. Note the only other watchers that currently due this are the ProxyWatcher
and LockWatcher
. Both the DatabaseWatcher
and AppWatcher
also have the same blocking bug.
select {
case c.CertAuthorityC <- casToSlice(c.host, c.user):
case <- ctx.Done():
HTTPClient
hanging indefinitely waiting for a response
CertAuthorityWatcher
hangs waiting for a response from GetCertAuthorities
and RotateExternalCertAuthority
. This is caused by two things. First we don't set a Timeout
on the http.Client
used by HTTPClient
.
Line 187 in 5023235
roundtrip.HTTPClient(&http.Client{Transport: transport}), |
While we do set some timeouts on the http.Transport
used by said http.Client
, that doesn't appear to be enough to prevent hanging forever in some situations.
Second, we don't properly propagate the callers context all the way down the the outgoing requests. So when the caller's context is cancelled there is nothing to stop the request.
Line 273 in bac0ccd
return httplib.ConvertResponse(c.Client.PostJSON(context.TODO(), endpoint, val)) |
Line 288 in bac0ccd
return httplib.ConvertResponse(c.Client.Get(context.TODO(), u, params)) |
The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
* Fix goroutine and memory leak in watchCertAuthorities The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
* Fix goroutine and memory leak in watchCertAuthorities The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
* Fix goroutine and memory leak in watchCertAuthorities (#10871) The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
* Fix goroutine and memory leak in watchCertAuthorities (#10871) The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
* Fix goroutine and memory leak in watchCertAuthorities (#10871) The CA Watcher was blocking both on writing to a channel when the watcher was closed and on HTTP calls that had no request timeout or context passed to cause cancellation. All resourceWatcher implementations that had a bug which may cause them to block on writing to a channel forever were fixed by selecting on the write and ctx.Done. Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to force callers to propagate context. Prior all calls used context.TODO which prevents requests from being properly cancelled. Add context propagation to RotateCertAuthority, RotateExternalCertAuthority, GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx from the CertAtuhorityWatcher all the way down to the HTTPClient that makes the call. Closes #10648
Description
It looks like watchCertAuthorities is not returning when trusted clusters are removed. Which subsequently prevents the underlying
CertAuthorityWatcher
to never be closed and causes a memory leak.Both leaks can be seen in the screenshot below:
The goroutine profile here was taken after deleting the clusters:
What happened:
While running the 500 trusted clusters tests I noticed that after the trusted clusters were deleted there was a leak of both memory and goroutines.
What you expected to happen:
When the trusted clusters are removed, all resources should return to levels they were at prior to adding the clusters.
Reproduction Steps
As minimally and precisely as possible, describe step-by-step how to reproduce the problem.
Server Details
teleport version
): 9.0.0-beta.1Client Details
tsh version
): 9.0.0-beta.1The text was updated successfully, but these errors were encountered: