Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect: limit agent cache size #4968

Open
banks opened this issue Nov 16, 2018 · 0 comments
Open

connect: limit agent cache size #4968

banks opened this issue Nov 16, 2018 · 0 comments
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/enhancement Proposed improvement or new feature type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic

Comments

@banks
Copy link
Member

banks commented Nov 16, 2018

Currently cache can grow without bound. I think there are several aspects we should address here but not sure how yet.

Reduced TTLs

Currently all our background refresh cache types (certs, intentions, service disco) have a TTL of 3 days which was chosen to ensure we keep them in memory during a weekend outage of servers. proxycfg.Manager (added in 1.3.0) actually invalidates that rationale since any resources needed by currently registered proxies are constantly blocked on and so won't be considered inactive. So we could set TTLs much more aggressively

This is important because of the partition by token behaviour - a ideal application/proxy that rotates it's ACL token every hour will generate all new cache entries each hour but we'll keep old ones around for 3 days consuming 72x the memory we actually need.

It's actually not that hard to do safely now for any of the existing Connect use-cases. If we want to support connect clients that don't block on their certs but poll then the TTL only really needs to be as long as the maximum poll frequency we consider reasonable - 30-60 mins seems like more than enough for anything that is trying to use Connect. In the case the client app crashes or is down from more than that TTL, it would result in a new cert being generated when it came back but that doesn't seem awful.

General Memory Limit

In general it would be nice to have a configurable limit on cache memory -- ideally in bytes but at least in number of entries. Then we'd want LRU behaviour to prevent against accidental or malicious pathological cases that chew through cached requests. (e.g. something using ?cached service discovery queries is configured to watch thousands of different service prefixes across tens of datacenters and rotates it's token every few hours...). This could in worst case end up consuming the same RAM on the client as the whole Raft state * number of token rotations/distinct client tokens used to fetch within the TTL.

This will require measuring or estimating cached response size though which is not impossible but maybe needs additional work on the interface since currently the cache just stores the result as an interface{}

@banks banks added type/enhancement Proposed improvement or new feature theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies labels Nov 16, 2018
@pearkes pearkes added this to the 1.4.3 milestone Feb 1, 2019
@mkeeler mkeeler modified the milestones: 1.4.3, 1.4.4 Mar 4, 2019
@hanshasselberg hanshasselberg modified the milestones: 1.4.4, 1.5 Mar 19, 2019
@pearkes pearkes modified the milestones: 1.5.0, 1.5.1 Apr 29, 2019
@mkeeler mkeeler modified the milestones: 1.5.1, 1.5.2 May 23, 2019
@pearkes pearkes modified the milestones: 1.5.2, 1.5.3 Jul 2, 2019
@freddygv freddygv self-assigned this Jul 9, 2019
@pearkes pearkes modified the milestones: 1.5.3, Upcoming Jul 22, 2019
@freddygv freddygv removed their assignment Jul 30, 2019
@pearkes pearkes modified the milestones: Upcoming, 1.7.x Aug 20, 2019
@mikemorris mikemorris added the type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic label Jun 30, 2020
@mikemorris mikemorris removed this from the 1.7.x milestone Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/enhancement Proposed improvement or new feature type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic
Projects
None yet
Development

No branches or pull requests

6 participants