Some questions on cachelib allocator, pool and pool size #26

tangliisu · 2021-07-30T17:40:53Z

when constructing allocator facebook::cachelib::LruAllocator cache(config), does this throw any errors/exceptions? How do we know the cache is created? Do we only need to check cache == nullptr?
another question is on pool size accommodation. For example, when i try to give cache 45GB and split the cache into two segments (pools), each of which has 30 GB and 15 GB. I have to do the following to assign the pool_size
cache->addPool(name1, cache_->getCacheMemoryStats().cacheSize * 30 / (30 + 15))
cache->addPool(name2, cache_->getCacheMemoryStats().cacheSize * 15 / (30 + 15))
Is there any easier way to accommodate the overhead?
In our system, we prefer the same key can exist in different pools. One way to do it might be adding prefix (pool index) to key name. But this string concatenation happens in every put/get operation, which may degrade the performance. Is there any better idea to support the feature that same key in different pools?

The text was updated successfully, but these errors were encountered:

therealgymmy · 2021-07-30T18:54:23Z

How do we know the cache is created? Do we only need to check cache == nullptr?

This will throw if there're bad configs. (You will see std::invalid_argument()). Cache is created if the constructor didn't throw. Typically we recommend you create cache as a std::unique_ptr<...>, so it is easy for you to move it around and destroy.

Is there any easier way to accommodate the overhead?

Can you clarify this question more? What overheads are you referring to?

Is there any better idea to support the feature that same key in different pools?

We currently do not support this. Because we share the index across all memory pools, the keys must be unique. On insertion, theoretically we can allow you to pass us two keys (prefix + actual key) and we just memcpy them into the item memory. However, on lookup, we have to concatenate because we need the key to be (prefix + actual key). And, I suspect it is the lookup that's the most expensive here (which we don't have a good way to solve it).

Have you measured how much perf overhead this is? (If keys are small <15 bytes, this shouldn't incur heap allocation if using std::string). If overhead is too much, you should consider using multiple CacheLib instances instead of cache pools.

tangliisu · 2021-07-30T19:15:12Z

Thank you for the quick reply!

Can you clarify this question more? What overheads are you referring to?

I think we can't directly add pool in the following way
cache->addPool(name1, 30GB); cache->addPool(name2, 15GB)
my understanding is fixed overhead needed to manage cache in cachelib so the memory allocated to two pools is less than 45GB.
Then the way I allocate memory to pool1 and pool2 is

cache->addPool(name1, cache_->getCacheMemoryStats().cacheSize * 30 / (30 + 15))
cache->addPool(name2, cache_->getCacheMemoryStats().cacheSize * 15 / (30 + 15))

but i am not sure if it is a good practice. How do you always set the pool size when multiple pools are needed

Have you measured how much perf overhead this is? (If keys are small <15 bytes, this shouldn't incur heap allocation if using std::string). If overhead is too much, you should consider using multiple CacheLib instances instead of cache pools.

We didn't do the perf test right now but will do it soon. Does using multiple cachelib instances impact the perf or have other disadvantage compared to using multiple cache pools in a Cachelib instance?

sjoshi6 · 2021-07-30T20:13:55Z

There are some experimental cachelib features which might be very useful for us in the future such as "Automatic pool resizing", "Memory Monitor". I believe we won't be able to leverage them well, if we have multiple cachelib instances in a process.

A typical key-size we encounter is ~32bytes.

sathyaphoenix · 2021-08-02T18:56:56Z

I'd recommend not spinning multiple instances of cachelib and instead use the pools to partition the memory available. The motivation for this is in-line with what @sjoshi6 brought up. Copying 32 bytes might not be a significant cpu impact compared to leaving fragmented memory that is harder to manage across multiple instances. If you'd like to not use std::string (since it uses heap allocation past 20 bytes and incurs a malloc + copy), you can still allocate memory on stack, copy the contents and wrap the stack memory into a Item::Key to call the find apis as long as the calls happen in the same call stack. Lots of performance critical applications do this trick.

tangliisu · 2021-08-03T06:21:18Z

@sathyaphoenix that's a great suggestion, thank you! I have another question on persistent cache. If we want to enable persistent cache, we have to set something as follows.

config.enableCachePersistence(path);
Cache cache(Cache::SharedMemNew, config);

Does this local path store metadata only or store the whole old cache instance? If only storing metadata, could i expect the size of metadata is very small?

sathyaphoenix · 2021-08-03T16:34:21Z

Does this local path store metadata only or store the whole old cache instance? If only storing metadata, could i expect the size of metadata is very small?

It is only storing some metadata, which should be less than a KB. You can have this be on any file system and is not performance critical. All the data and any heap metadata is persisted either through shared memory or on-device. The metadata stored within files in the cache directory is some limited information to recover all other pieces of information.

sathyaphoenix · 2021-08-03T16:34:59Z

@tangliisu I'll close this ticket since the original questions are answered. Please feel free to open a new one if you have any additional questions :)

sathyaphoenix closed this as completed Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions on cachelib allocator, pool and pool size #26

Some questions on cachelib allocator, pool and pool size #26

tangliisu commented Jul 30, 2021

therealgymmy commented Jul 30, 2021

tangliisu commented Jul 30, 2021 •

edited

sjoshi6 commented Jul 30, 2021

sathyaphoenix commented Aug 2, 2021

tangliisu commented Aug 3, 2021 •

edited

sathyaphoenix commented Aug 3, 2021

sathyaphoenix commented Aug 3, 2021

Some questions on cachelib allocator, pool and pool size #26

Some questions on cachelib allocator, pool and pool size #26

Comments

tangliisu commented Jul 30, 2021

therealgymmy commented Jul 30, 2021

tangliisu commented Jul 30, 2021 • edited

sjoshi6 commented Jul 30, 2021

sathyaphoenix commented Aug 2, 2021

tangliisu commented Aug 3, 2021 • edited

sathyaphoenix commented Aug 3, 2021

sathyaphoenix commented Aug 3, 2021

tangliisu commented Jul 30, 2021 •

edited

tangliisu commented Aug 3, 2021 •

edited