Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Delta server] Properly close watches when a stream is closed #570

Merged
merged 1 commit into from
Jul 11, 2022

Conversation

valerian-roche
Copy link
Contributor

@valerian-roche valerian-roche commented Jul 1, 2022

We noticed on our environments that control-planes getting no activity outside of serving xds through delta-ads to a constant nb of pods had an ever-growing consumption of memory
Investigating further we noticed that:

  • this memory increase correlated to a nb of goroutine increase (even if the nb of clients is constant)
  • the increase occur each time we restart those client pods (done on purpose to test some edge cases)
  • the increase is two goroutines per pod, which makes sense as those use lds and cds as dynamic resources and do not trigger rds, eds or rtds watches

Taking a goroutine dump we identified that leaked goroutines are from https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L180
Those goroutines will only be gc-ed once the channel has either returned a response or been closed. This closure is expected to happen in watch.Cancel(), but in the case of a node getting removed, we are actually calling the potentially set cancel function attached to the watch, but we do not close this channel

Disclaimer: we use a slightly different fork of the repository. While I do not believe it impacts here, we are not using vanilla control-plane today

goroutine count with/without the fix over time:
image

Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Copy link
Contributor

@alecholmez alecholmez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change! Glad we can cleanup this leak

@alecholmez alecholmez merged commit 6efcb69 into envoyproxy:main Jul 11, 2022
@Hexta Hexta mentioned this pull request Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants