Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to get a read-only snapshot of the FrequencySketch of Cache #313

Open
Tracked by #327
tatsuya6502 opened this issue Sep 2, 2023 · 0 comments
Open
Tracked by #327
Labels
enhancement New feature or request

Comments

@tatsuya6502
Copy link
Member

tatsuya6502 commented Sep 2, 2023

#311 (comment) by @peter-scholtens

Similar as mentioned in #235, I try to store and read-in my Cache content locally during a re-start of my server. However, this means the expiration state will be reset: the frequency is set to zero and the TTL/TTI maximized. Would it be possible to iterate with Expiration state? So this state information can be stored too.

Instead of providing the read frequency of a key in a Cache, we will provide a read-only snapshot of the FrequencySketch, the historic popularity estimator, of the Cache.

  • This will be preferable because we can preserve all the information in the FrequencyScketch.
    • FrequencySketch tracks the access counts of not only the keys that are currently in the cache, but also all the keys that have been attempted to get from the cache. (frequency is incremented for a key for both cache hit and miss).
    • We do not know what the all keys are as the FrequencySketch does not store the keys.
  • The FrequencySketch is a part of the Cache state and guarded by a mutex.
    • We do not want to expose it directly, which may cause lock contentions.
    • Instead, we will provide a read-only snapshot (clone) of it.

By #314, we can recreate a Cache with a given FrequencySketch snapshot and BuildHasher.

For the FrequencySketch snapshot, we will provide the following methods and functions:

  • Get an estimated frequency (u8) of a given key:
    • The stored frequency is unsigned 4-bit integer.
  • Convert a FrequencySketch into a serialized form Box<[u8]>.
    • The Box<[u8]> should contain all the information of the FrequencySketch snapshot, plus:
      • A hard-coded UUIDv4 as the serialize format version identifier.
      • A check sum, which is the hash value of the serialized data calculated by the hasher of the cache.
  • Recreate it from a serialized form.
    • Verify the followings:
      • The hard-coded UUIDv4.
    • Provide validate method, which takes a hasher and verify the check sum.
      • CacheBuilder will call it when building a CacheLoader with this frequency sketch snapshot and a BuildHasher.
use ahash::RandomState;
use moka::sync::Cache;

fn generate_high_qualyty_seeds() -> [u64; 4] {
    // ...
}

let seeds = generate_high_quality_seeds();
let build_hasher = RandomState::with_seeds(seeds[0], seeds[1], seeds[2], seeds[3]);

let cache = Cache::builder()
    .max_capacity(10_000)
    .build_with_hasher(build_hasher);

// Force the cache to enable the frequency sketch because it
// should be disabled when the cache is less than 50% full.
//
// NOTE: If the max capacity of the cache is not set, this will
// be no-op.
cache.policy().enable_frequency_sketch();

cache.get(&0); // => frequency 1
cache.insert(0, 'a');
cache.get(&0); // => frequency 2

// Get a read only snapshot of the frequency sketch.
let sketch = cache.policy().frequency_sketch().unwrap();

sketch.estimated_frequency(&0); // => maybe 2u8
sketch.estimated_frequency(&1); // => maybe 0u8

let serialized = sketch.into_bytes(); // Box<[u8]>

// Rebuild the cache with the serialized frequency sketch.

let deserialized = FrequencySketch::from_bytes(serialized);
// Ensure to use the exact same hasher, because the sketch
// will return correct estimates only when the correct hasher
// is used.
let build_hasher = RandomState::with_seeds(seeds[0], seeds[1], seeds[2], seeds[3]);
let cache = Cache::builder()
    // See https://github.com/moka-rs/moka/issues/314
    .loader_with_frequency_sketch(deserialized, build_hasher)
    .unwrap()
    .finish();

// The cache is empty but it should have the same frequency sketch
// as the previous one.
assert_eq!(cache.entry_count(), 0);
let sketch = cache.policy().frequency_sketch().unwrap();
sketch.estimated_frequency(&0); // => maybe 2u8
sketch.estimated_frequency(&1); // => maybe 0u8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant