rax-maas · iWebi · Aug 13, 2022 · Aug 10, 2022
diff --git a/blueflood-core/src/main/java/com/rackspacecloud/blueflood/cache/TokenCache.java b/blueflood-core/src/main/java/com/rackspacecloud/blueflood/cache/TokenCache.java
@@ -12,6 +12,36 @@
 
 import java.util.concurrent.TimeUnit;
 
+/**
+ * Caches tokens that have been written to Elasticsearch. Like the {@link LocatorCache}, this is a memory cache that
+ * guards Elasticsearch from excessive writes during ingestion. When a token is successfully indexed, it's cached here.
+ * If a token is present in the cache, it won't be written to Elasticsearch again. This functionality is implemented in
+ * ElasticTokensIO.
+ *
+ * "Tokens" are the separate, dot-separated pieces of a hierarchical metric name. See {@link Token} for more details.
+ * ElasticTokensIO caches only the non-leaf tokens under the assumption that locator caching/guarding happens
+ * beforehand, specifically in the {@link com.rackspacecloud.blueflood.inputs.processors.DiscoveryWriter}. If a locator
+ * has already been run through discovery, all its tokens will have been processed as well.
+ *
+ * Since the locator cache is checked before processing tokens, it might seem that this cache is redundant. In the
+ * special case of a node start/restart, however, every locator appears new, and a huge number of tokens is generated.
+ * At that time, this cache is heavily used and greatly reduces the number of tokens that need to be indexed. It's also
+ * worth pointing out that after that initial burst of activity, the entries in this cache will begin expiring out and
+ * won't be replaced.
+ *
+ * If a locator is present in the locator cache, then we won't attempt to index tokens for it either. This is good, but
+ * in the case of a perfect locator cache, it means no calls will ever reach the token cache. Unfortunately, the current
+ * cache implementation only removes expired entries when you actually access the cache, so this situation can lead to a
+ * lot of tokens sitting in the token cache that are never cleaned out. If you see high usage in the token cache after
+ * its TTL, it probably means you're in this situation.
+ *
+ * TODO: Fix the issue described above. One option is finding another cache implementation that will actively remove
+ *       expired entries. Another option would be starting a thread to periodically run a cleanup on the cache. Note
+ *       that the caches used for throttling in DiscoveryWriter have a similar problem and could benefit from periodic
+ *       cleanups instead of the inline cleanups of the current implementation
+ *
+ * See the LOCATOR_CACHE_* and TOKEN_CACHE_* setting in {@link CoreConfig} to tune the caches.
+ */
 public class TokenCache {
 
     // this collection is used to reduce the number of tokens that get written.