-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxycfg: rate-limit delivery of config snapshots #14960
Conversation
52032ec
to
621df51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I made a few suggestions related to style guide consistency.
// This runs in another goroutine so we can't just do the send | ||
// directly here as access to snap is racy. Instead, signal the main | ||
// loop above. | ||
sendCh <- struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx
is canceled when there's a pending update this goroutine will never halt.
The fix is simple, but for posterity, I'll add it in a follow-up PR once this one is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally try to avoid time.AfterFunc and instead just use regular timers which can be selected on in the main loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. AfterFunc
definitely hides the goroutine here. We had a similar leak related to memdb's WatchCh
method: #14916.
621df51
to
e9feaa8
Compare
Adds a user-configurable rate limiter to proxycfg snapshot delivery, with a default limit of 250 updates per second. This addresses a problem observed in our load testing of Consul Dataplane where updating a "global" resource such as a wildcard intention or the proxy-defaults config entry could starve the Raft or Memberlist goroutines of CPU time, causing general cluster instability.
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
ffec272
to
db06c61
Compare
agent/proxycfg/state.go
Outdated
scheduleUpdate := func() { | ||
r := s.rateLimiter.Reserve() | ||
|
||
coalesceTimer = time.AfterFunc(coalesceTimeout+r.Delay(), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this instead wait for max(coalesceTimeout, r.Delay())
?
coalesce timeout will ensure this states snapshot doesn't get emitted too frequently
r.Delay ensures that we don't emit snapshots too frequently globally.
This might be sort of a premature optimization but figured I would note it anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I think either is fine honestly, but I guess using the max
could save 200ms of unnecessary waiting - will change it 🙇🏻♂️
Saves 200ms of unnecessary waiting in cases where the snapshot delivery is rate-limited.
Description
Adds a user-configurable rate limiter to
proxycfg
snapshot delivery, with a default limit of 250 updates per second.This addresses a problem observed in our load testing of Consul Dataplane where updating a "global" resource such as a wildcard intention or the
proxy-defaults
config entry could starve the Raft or memberlist goroutines of CPU time, causing general cluster instability.Testing & Reproduction steps
Tested using the Consul Dataplane load test harness. Observed that creating a wildcard intention with 5k connected streams no longer triggers missed memberlist pings (and suspicions from other peers).