-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid allocationClassSizeFactor #30
Comments
I take some local test and cachelib works well. Just crashed when i go to test cluster |
That's strange. The default in the config is 1.25 here (https://github.com/facebookincubator/CacheLib/blob/main/cachelib/allocator/CacheAllocatorConfig.h#L569). Can you share the stack trace of the exception and also log the config.allocationClassSizeFactor before creating the cache through make_unique. Does it always crash and does the error happen when you manually set the factor through setDefaultAllocSizes() ? |
Hi I will private build to check config.allocationClassSizeFactor tomorrow and check if it will crash if i manually set the factor tomorrow. this is the stack trace of the exception.
|
Hi @sathyaphoenix I tried to private build again. Surprisingly cachelib is not crashed and allocationClassFSizeFactor is as expected. |
Thanks for confirming. If you can, please run with ASAN enabled and see if it can provide more information. For now, I am closing this issue. Please reopen if this re-appears and needs investigation. |
Hi @sathyaphoenix finally we find the root cause is we set -DFOLLY_SSE=0 to support AVX512 compiler optimizer. But cachelib requires folly::dynamic and f14map in nvmconfig and f14map requires at least FOLLY_SSE=2. I think cachelib does not check this case but just throws an error with a confusing error message. The error does not appear in private build is because we don't use any compiler optimizer in private build pipeline. After setting folly_sse=2 in our master build pipeline, the error goes away. Do you think we can add an additional check or have a comment in nvmconfig to avoid this issue? |
@tangliisu Can you share the confusing error message that you see and also more details on how this causes the double value to be ~0. Also, please note that NvmConfig has moved away from using folly::dynamic in the main branch and it has simple declarative api to configure it. https://cachelib.org/docs/Cache_Library_User_Guides/Configure_HybridCache) .. We do rely on F14Map though. Once you share the error message, we can look into an appropriate work around. |
Thanks for the info. We pin cachelib to an old version so nvmconfig is still there. I could not reproduce the error message allocationClassFSizeFactor ~0 in recent build. Recently the bad build error stack trace is
which makes sense. But i happened to get the confusing ~0 error before we figured out the FOLLY_SSE=0 issue
If cachelib still rely on F14Map, i guess we need to have FOLLY_SSE=2. BTW we implemented cachelib in our system. The perf is very impressive. We are still working on tuning the cachelib to see if we could further reduce the CPU usage. |
Great to hear it is working out as expected. Let us know if you need any information for tuning. It is strange though that not setting FOLLY_SSE=2 would cause an unrelated double to be broken. cc @agordon if he has any insights to share. |
When I test cachelib in a test cluster, an error is thrown
E0818 21:25:01.105298 14 cachelib_cache_handler.cpp:54] invalid factor 6.93298464824273e-310
it throws from the check from MemoryAllocator.cpp
generateAllocSizes { if (factor <= 1.0) { throw std::invalid_argument(folly::sformat("invalid factor {}", factor)); } }
The way i create cachelib instance is
where cache_size = 76GB.
I didn't set allocationClassSizeFactor anywhere and i think it is defaulted as 1.25? I am not sure what this config (allocationClassSizeFactor) is and why it is 0.
The text was updated successfully, but these errors were encountered: