Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(kubernetes): Improve failure mode for unreachable cluster (#3770)
* fix(kubernetes): Improve failure mode for unreachable cluster We currently cache any call to get the namespaces for an account with an expiry time of 30 s using a memoized supplier. When a cluster is unreachable, the call to get the cluster's namespaces will hang and eventually time out; we then log a warning and return an empty array of namesapces. If the call to kubectl returns an error, we don't cache the empty list return value, so every call to get namespaces will call kubectl. This leads to a bad failure mode where a slow/unresponsive cluster leads to more calls than a fast/responsive cluster. To address this, when a call to get namespaces returns an error, cache the empty list we're returning for the same amount of time as a successful call. * fix(kubernetes): Use custom memoizer for kubectl calls We're currently using a guava memoized supplier for calls to get namespaces and crds in a cluster, with an expiration time of 30s. The way the guava memoizer works is to record the timestamp at the time it starts executing the supplier function rather than when the function completes. This means that if the function to get namespaces takes more than 30s, we never get a cache hit at all because the entry has expired by the time it is added to the cache. This leads to cases where the cache is least effective when it is most necessary. Instead write a small Memoizer class that wraps a caffeine cache, as caffeine caches mark cache entries at the time of insertion to the cache (after the work is finished) rather than when the work starts. Use this for caching kubectl calls instead of the guava cache.
- Loading branch information