Skip to content

Commit

Permalink
Merge pull request #907 from rax-maas/token-cache-doc
Browse files Browse the repository at this point in the history
document TokenCache
  • Loading branch information
iWebi committed Aug 13, 2022
2 parents f0b6303 + 67c22ab commit 3d338d8
Showing 1 changed file with 30 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,36 @@

import java.util.concurrent.TimeUnit;

/**
* Caches tokens that have been written to Elasticsearch. Like the {@link LocatorCache}, this is a memory cache that
* guards Elasticsearch from excessive writes during ingestion. When a token is successfully indexed, it's cached here.
* If a token is present in the cache, it won't be written to Elasticsearch again. This functionality is implemented in
* ElasticTokensIO.
*
* "Tokens" are the separate, dot-separated pieces of a hierarchical metric name. See {@link Token} for more details.
* ElasticTokensIO caches only the non-leaf tokens under the assumption that locator caching/guarding happens
* beforehand, specifically in the {@link com.rackspacecloud.blueflood.inputs.processors.DiscoveryWriter}. If a locator
* has already been run through discovery, all its tokens will have been processed as well.
*
* Since the locator cache is checked before processing tokens, it might seem that this cache is redundant. In the
* special case of a node start/restart, however, every locator appears new, and a huge number of tokens is generated.
* At that time, this cache is heavily used and greatly reduces the number of tokens that need to be indexed. It's also
* worth pointing out that after that initial burst of activity, the entries in this cache will begin expiring out and
* won't be replaced.
*
* If a locator is present in the locator cache, then we won't attempt to index tokens for it either. This is good, but
* in the case of a perfect locator cache, it means no calls will ever reach the token cache. Unfortunately, the current
* cache implementation only removes expired entries when you actually access the cache, so this situation can lead to a
* lot of tokens sitting in the token cache that are never cleaned out. If you see high usage in the token cache after
* its TTL, it probably means you're in this situation.
*
* TODO: Fix the issue described above. One option is finding another cache implementation that will actively remove
* expired entries. Another option would be starting a thread to periodically run a cleanup on the cache. Note
* that the caches used for throttling in DiscoveryWriter have a similar problem and could benefit from periodic
* cleanups instead of the inline cleanups of the current implementation
*
* See the LOCATOR_CACHE_* and TOKEN_CACHE_* setting in {@link CoreConfig} to tune the caches.
*/
public class TokenCache {

// this collection is used to reduce the number of tokens that get written.
Expand Down

0 comments on commit 3d338d8

Please sign in to comment.