Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose raw hash APIs #135

Closed
VladimirBramstedt opened this issue May 23, 2022 · 5 comments
Closed

expose raw hash APIs #135

VladimirBramstedt opened this issue May 23, 2022 · 5 comments

Comments

@VladimirBramstedt
Copy link

I have a use case where i want to cache an object which is keyed by a pretty large object that is deserialized. calculating the hash for this key is already done as part of deserializing and parsing,and it would be a substantial performance gain to not have to rehash it. i saw that there are in fact functions that are currently pub(crate) such as get_with_hash or insert_with_hash. would it be possible to expose them in the public API?
if you dont want those functions completely public for API stability reasons, it would still be nice to expose them behind a feature flag, for those of us who have some specialized use cases :)

@tatsuya6502
Copy link
Member

Hi. Thank you for using Moka.

It would not be necessary to expose these APIs. You can write your oun hasher, which simply pass-through the pre-calculated hash value. And then you can give the hasher to the cache. Here is an example:

use std::{
    hash::{BuildHasherDefault, Hash, Hasher},
    sync::Arc,
};

//
// Define a struct for the key, which holds the key and its hash value.
//

#[derive(Clone, PartialEq, Eq)]
pub struct MyKey {
    _key: Arc<Box<[u8]>>,
    hash_value: u64,
}

impl MyKey {
    pub fn new(key: Box<[u8]>, hash_value: u64) -> Self {
        Self {
            _key: Arc::new(key),
            hash_value,
        }
    }
}

// Implement Hash. We will only use the pre-calculated hash value.
impl Hash for MyKey {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.hash_value.hash(state);
    }
}

//
// Create a Hasher, which just uses a single u64 value as the final hash.
//

#[derive(Default)]
struct PassthroughHasher(u64);

impl Hasher for PassthroughHasher {
    fn write_u64(&mut self, i: u64) {
        self.0 = i;
    }

    fn write(&mut self, _bytes: &[u8]) {
        unimplemented!()
    }

    fn finish(&self) -> u64 {
        self.0
    }
}

// Define a BuildHasher using our Hasher. We will give this to our cache.
type PassthroughBuildHasher = BuildHasherDefault<PassthroughHasher>;

fn main() {
    let cache = moka::sync::Cache::builder()
        .max_capacity(100)
        // Give our build hasher to this cache.
        .build_with_hasher(PassthroughBuildHasher::default());

    let key1 = MyKey::new(b"key1".to_vec().into_boxed_slice(), 0);
    let key2 = MyKey::new(b"key2".to_vec().into_boxed_slice(), 1);

    cache.insert(key1.clone(), "value1");
    cache.insert(key2.clone(), "value2");

    assert_eq!(cache.get(&key1), Some("value1"));
    assert_eq!(cache.get(&key2), Some("value2"));
}

Will this trick work for you?

@VladimirBramstedt
Copy link
Author

it would, but i would think it is perhaps more ergonomic to provide something similar to the raw_entry API from hashbrown https://docs.rs/hashbrown/latest/hashbrown/raw/struct.RawTable.html#method.insert, etc, especially if these functions are already implemented.
its a bit a question of API, i guess.

@tatsuya6502
Copy link
Member

Unfortunately, exposing sync::Cache's get_with_hash and insert_with_hash does not solve your problem.

Like many other hash tables, Moka cache's internal hash table re-calculates hash values of existing keys when it resizes (reallocates) the storage area for growth. This operation is called rehashing. So even though, you use the pre-calculated hash when inserting and retrieving an entry, the cache will still re-calculate hashes from the keys when the cache grows. If you use a different hash function for pre-calculating the hash to the hash function that the cache is given at the build time, it will break; get_with_hash will no longer be able to locate the key after growing.

On the other hand, the example code above with PassthroughHasher ensures that the hash is never re-calculated from the key, at insertion, retrieval and rehashing times.

@tatsuya6502
Copy link
Member

Here is the rehash method in Moka's internal hash table:

pub(crate) fn rehash<H>(

and this is the line in rehash that re-calculates hash from key.

let hash = hash(build_hasher, key);

@VladimirBramstedt
Copy link
Author

ah, i missed that. thanks a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants