-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrations/operator: re-use the teleport client instead of creating a new one #34050
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - let's put that RWMutex in to the Memory Destination though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See slack, just realised where the root of your memory leak is.
return b.cachedClient, nil | ||
} | ||
|
||
freshClient, err := b.clientBuilder(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we ever closing the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the leak I spotted - the problem also exists in the existing code. We can probably adjust this PR to fix this since the other content of the PR is still valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 13de098
This is not super clean, and we'll definitely want to get rid of this when tbot will send us clients with in-place cert-renewal. However, tbot changes won't be backported to v12/v13, so we need the current fix for those versions.
68243a8
to
13de098
Compare
360cf2b
to
b3b6690
Compare
integrations/operator/controllers/resources/github_connector_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
b30cb50
to
ba32bcb
Compare
d4bed68
to
90121a4
Compare
@hugoShaka See the table below for backport results.
|
… a new one (#34050) * integrations/operator: re-use the teleport client instead of creating a new one * fix race condition * address feedback + add godocs
… a new one (#34050) * integrations/operator: re-use the teleport client instead of creating a new one * fix race condition * address feedback + add godocs
…eating a new one (#34431) * integrations/operator: re-use the teleport client instead of creating a new one (#34050) * integrations/operator: re-use the teleport client instead of creating a new one * fix race condition * address feedback + add godocs * fixup! integrations/operator: re-use the teleport client instead of creating a new one (#34050)
Fixes #24110
This PR addresses several major issues of the Teleport Operator:
This should the ongoing memory issues several users reported, should largely reduce the impact of broken reconciliation and reduce the memory spikes when reconciling many resources. Another PR from @tigrato will reduce the amount of unnecessary reconciliations. With both PRs we should be in a much better place in terms of CPU/memory load on the operator side.
How it works
With this PR the embedded tbot now caches the client and can skip the whole connection dance if the certs have not changed. In the future, tbot will return us a single client with rolling certificates, which will simplify the whole thing.
This PR also wraps the teleport client in a new structure that contains an RWLock and tracks who is using the client. This allows us to ensure no one is using the client before closing it.
Finally, this PR adds an RWLock on tbot's memory destination to ensure safe reads from the sidecar (we don't want to read while tbot is writing the renewed cert, this would end badly).
changelog: The operator reuses its connection to Teleport. Reduces CPU usage, logs, and fixes a memory leak.