New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebalancing errors leading to a crash #112
Comments
Yeah. It sounds like a code path expects compact cache to be present, but it is infact not a compact cache. I suppose in the setup, you don't create any compact caches and have just a single default pool ? One suggestion would be to stop pool rebalancer explicitly on startup through this api and see if the problem resurfaces as a different one.
this is thrown by the api here and from my quick digging, it should not be called by any pool rebalancing code. |
Thanks @sathyaphoenix. I indeed don't create a compact cache and I indeed only use a single pool (the code above is practicly all the code that uses CacheLib). Stopping the pool rebalancer as you suggested allowed my runs to complete without an issue, don't know if it solved or just hid the real issue but it's a step forward. What's the impact of having no pool rebalancer? Is it an acceptable setup for a single cache and a single pool? I wanna make sure I'm not degrading performance with that. |
@erangi: running without pool rebalancing may cause memory inbalance (e.g. some allocation sizes have no memory to allocate while others have excessive free memory). Can you share your cache setup and repro instruction? (Both with the rebalancing crash and with the change that made it go away). Looking at the error message, it smells like corruption if you just have a single pool. The message suggests there exists a PoolId 80 (81st pool created).
|
@therealgymmy the cache setup is in my first post, it's pretty straightforward. The only unconventional thing I do is wrap this client code in JNI so I can benchmark CacheLib using YCSB (which is Java). The extended YCSB and JNI wrapper are WIP so I'd rather not publicly share them but I can share them privately if that helps. I agree this looks like memory corruption but it's consistent over runs and machines (except the pool ID, which is often 80 but varies). FWIW, our benchmarks use a fixed key and value sizes (less realistic but simpler...), so I don't think this synthetic use case will be very challenging for a memory manager. |
@erangi: if it's fixed key and value size, then you can disable rebalancer. It won't do much anyway. |
Sounds good, thanks! |
@erangi I 'm sorry to bother you. I also want to benchmark CacheLib using YCSB. But I 'm not famliar with YCSB. |
Hi. I'm getting weird errors while working with CacheLib. The cache creation and first operations complete with no issues, but after a few seconds I start seeing a series of reports such as:
E0108 16:48:53.001791 315636 PoolRebalancer.cpp:50] Rebalancing interrupted due to exception: PoolId 80 is not a compact cache
, eventually ending with a segfault at:facebook::cachelib::MemoryPoolManager::getPoolById(signed char) const+0x20
This smells like a memory issue, but it's consistent so I don't think it's some random memory overwrite. Can you spot the root cause?
Context:
My CacheLib client code (snippets from the cachelib_api.cpp referred to by cmake):
CMakeLists.txt looks as follows:
The text was updated successfully, but these errors were encountered: