New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HybridCache Output #106
Comments
If you were to use a synthetic workload generator, the number of items in the cache is controlled by the "numKeys" that cachebench uses to generate the workload. If that is small enough to fit in the cache, then the cache will never become full. I see that you are using a trace file here. So it is likely that the total number of unique keys in the trace file does not exceed the cache size. Can you verify if that's the case. |
I can verify that it is not the case since the number of items in the cache increases as I increase the size of the cache. Also since there are cache eviction, there have to be items that didn't fit. I suspected it was because of different allocation classes so I adjusted the allocation size manually which worked. So now for the following configuration: { 22:40:34 68453 ( 0.07M) ops completed But I have a few questions about allocation sizes. For "allocSizes" since I am adding fixed 4KB pages why do I have to use the value 4135 for the items to be admitted to the cache? I used lower value and no items got admitted. The same with "navySizeClasses" where I have to used to first largest value that I can use larger than 4096 which is 4096 + 512 which is the navyBlockSize. Is the key size also included in the allocation size class measurement? I thought it was only the data size. One more question about the memory usage. I ran the above configuration and the memory usage currently when around 10% of simulation is done is around 3GB? Isn't that too high? It continues to increase. MiB Mem : 122748.8 total, 110476.4 free, 2999.3 used, 9273.1 buff/cache
154419 root 20 0 2868816 2.0g 62592 S 102.7 1.6 |
allocSizes for DRAM is the size that includes key + data + Item overhead. If you don't specify one, cachebench has defaults which I believe uses a 1.25 factor. For Navy, this is similar for LOC if you don't use stack allocation. SOC does not need alloc size since everything is stack allocated. The best option if you don't care about size classes for Navy is to use stack allocation mode (ie specify an empty array of "navySizeClasses") and use "truncateItemToOriginalAllocSizeInNvm" to avoid any extra unused space from DRAM Item to be truncated while writing to SSD. See https://cachelib.org/docs/Cache_Library_User_Guides/Configuring_cachebench_parameters#large-item-engine-parameters for more info.
It does sound higher for a cacheSizeMB of 100. Cachebench itself uses some memory when consistency checking mode is used , but for trace replay that should be relatively small. CacheLib's memory usage should mostly come from the hash table for DRAM cache and the sparse map for Navy. I doubt these could add up to 3GB unless the configuration you chose for htBucketPower is misconfigured. |
I see. Thank you for the suggestions! Really helpful. I use the default value for htBucketPower which is 22 (uint64_t htBucketPower{22}; // buckets in hash table). Could you please explain to me a little how this could relate to increased memory usage? |
htBucketPower of 22 means that the hashtable has 2^22 slots, 4 bytes each. That should account for about 16MB. The sparse map in Navy is about 10 bytes per entry. If you have a heap profile, it would help you chase the source of the memory usage. |
For the following stats, 00:01:58 107561 ( 0.11M) ops completed Memory usage is 2.8GB. 1.7G Resident. The overhead from the buckets and the Navy entry seems to low to account for the memory usage which seems to be increasing and off by almost 1GB. I will try to look into it. |
I forgot to ask what does stack allocation mode do? Also why is the success rate of set go low under high pressure? I have a workload where it has gone down as much as 54%. == Throughput Stats == |
This mode only applies to BlockCache (Navy's large item cache). It will write items of different sizes into the same region (a region is the granularity which we write and evict in BlockCache). This is as opposed to using size class mode where items of the same sizes will be written into the same region. It is recommended that user uses stack alloc + in-mem buffers, as we will be deprecating the other write modes in the near future.
This indicates allocation failures when trying to insert into the RamCache. Can you describe your workload and cache setup in more details? Are the object sizes spanning a large range? What is the ram-cache size? One possible scenario is that ram-cache is too small, and object sizes span a vast range. This could lead to a lot of contention on allocation classes which have little memory (and thus alloc failures). |
@pbhandar2 marking this as resolved. please re-open a new discussion or issue if there are any more un-addressed issue. |
Hello,
I am using HybridCache to extend the size of the cache to fit the workload.
I am writing fixed size page of 4096 bytes to the cache. However, even when the NVM cache size is set to 50GB, the number of objects in the cache does not exceed 1,958,200 which means total data of 1,958,200*4096 bytes which is around 7.5GB only. I notice similar pattern with the DRAM usage as well where the number of items of size 4096 bytes is lower than what it should be based on the allocation. Is there any allocation setting that I am not setting to optimize for fixed 4096 byte pages which is leading to fragmentation?
This is the configuration that I am using. Please ignore the additional parameters that I have added.
{
"cache_config": {
"cacheSizeMB": 100,
"minAllocSize": 4096,
"navyBigHashSizePct": 0,
"nvmCachePath": ["/flash/cache"],
"nvmCacheSizeMB": 50000,
"navyReaderThreads": 32,
"navyWriterThreads": 32,
"navyBlockSize": 4096,
"navySizeClasses": [4096, 8192, 12288, 16384]
},
"test_config": {
"enableLookaside": "true",
"generator": "block-replay",
"numThreads": 1,
"traceFilePath": "/home/pranav/csv_traces/w81-w85.csv",
"traceBlockSize": 512,
"diskFilePath": "/disk/disk.file",
"pageSize": 4096,
"minLBA": 0
}
}
== Allocator Stats ==
Items in RAM : 17,711 (100MB allocation should fit more item!)
Items in NVM : 1,958,200 (50GB allocation only fits 2million 4KB pages)
Alloc Attempts: 122,793,861 Success: 100.00%
RAM Evictions : 115,587,253 ( Why is each eviction not being admitted into the cache? If it is then why is items in NVM limited to 1,958,200?)
Cache Gets : 113,169,162
The text was updated successfully, but these errors were encountered: