-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port folly::JemallocNodumpAllocator #4534
Conversation
cc @interwq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
include/rocksdb/cache_allocator.h
Outdated
// Returns the memory size of the block allocated at p. The default | ||
// implementation that just returns the original allocation_size is fine. | ||
virtual size_t UsableSize(void* /*p*/, size_t allocation_size) const { | ||
// default implementation just returns the allocation size | ||
return allocation_size; | ||
} | ||
}; | ||
|
||
// Factory interface to obtain CacheAllocator instances. | ||
class CacheAllocatorFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "cache allocator" is very confusing to me. Why not call it something like "block allocator" or something like that (because it allocates blocks)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igorcanadi introduce it in #4437. I don't think "block allocator" is more appropriate since we use it together with Cache
, not BlockCache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocator is used inside BlockFetcher, not cache. The class's function has nothing to do with cache at all. It's just that contained by a cache and users can get it from the cache object. Also, "cache allocator" can be easily misunderstood to be an allocator of caches. Actually it allocates memory. I guess MemoryAllocator is a bad idea as we have another allocator interface inherited by arena. How about MemoryAllocatorWrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we only use it in BlockFetcher doesn't affect the allocator abstraction I guess. Just like the Cache
didn't named BlockCache
and we later reuse it as also as table cache. The allocator interface can be used later to allocate other memory. I don't see problem with CacheAllocator
but I'm fine with MemoryAllocator
. I don't think the class needs to be a wrapper
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemoryAllocator
feels great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying I'll make that change in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'll make the name change in #4502.
include/rocksdb/cache_allocator.h
Outdated
virtual const char* Name() const = 0; | ||
|
||
virtual Status NewCacheAllocator( | ||
std::unique_ptr<CacheAllocator>* cache_allocator) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to generate new cache allocator? Another word, do we need this factory at all? Can we just plug in an allocator to allocate blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I come up with the factory class because I want to have ShardedCache
handle allocator sharding. In a followup patch, I'll update ShardedCache
to shard allocator by cache key. But yeah, if we want to shard in another way (round-robin?), or don't need sharding at all, the factory class is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thoughts on the fragmentation in this per shard allocator setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to @interwq, Jemalloc by default initiate 4 * cpu_cores arenas and assign static arena id to threads. Assume we run on server with 56 cores, that's 224 arenas. Compare with we have by default 64 arenas for block cache, arguably we should have less fragmentation.
For users run multiple DB instance each with dedicated block cache, they can (and should) use a shared CacheAllocator
for those block caches, if they wish to get the no dump option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the fact that with this approach, we are separating allocation for normal c++ objects and block contents. I'm arguing that normal c++ object tend to be smaller than block size. By separating allocation for them, arguably we should see fragmentation reduce (instead of increase) since mixing allocation of different size class tend to increase fragmentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each allocator class contains configurable number of arenas, it sounds great to me, it is much better than the previous idea of arena per block cache shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwu-arbug I'm not sure about the assumption that other allocations are all smaller than block size. RocksDB is a library, we don't know how users allocate memory. Even looking at RocksDB, large allcoation outside blocks are just as common, just to name some: malloc for reading uncompressed blocks, exponential readahead in iterators, memtable arena, compaction writeable file buffer, table builder's buffering of key hashes, indexes and bloom filter being built, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. I'll leave the consideration as inline comment in the header file. I don't think we will turn the no dump options on by default. As with any feature, user bother to do so should evaluate if they are willing to make the tradeoff. Of cause we'll be careful with the MyRocks use case.
include/rocksdb/cache_allocator.h
Outdated
@@ -16,14 +22,35 @@ class CacheAllocator { | |||
// Name of the cache allocator, printed in the log | |||
virtual const char* Name() const = 0; | |||
|
|||
// Allocate a block of at least size size | |||
// Allocate a block of at least size. Has to be thread-safe. | |||
virtual void* Allocate(size_t size) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason it is "void*" rather than "char*"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again it was introduced in #4437. But since even malloc
return void*
, I think it is more natural. It means it don't imply typing. Again I don't think the CacaheAllocator
interface is specific to block cache.
cache/jemalloc_nodump_allocator.cc
Outdated
|
||
JemallocNodumpAllocator::~JemallocNodumpAllocator() { | ||
assert(arena_index_ != 0); | ||
std::string key = "arena." + ToString(arena_index_) + ".destroy"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this functionality? Destroying an arena will also release all the memory managed by the arena, which should be done carefully. If the allocator is only created once and never released, it's fine not destroying it at shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, it is pretty safe to release all memory at this point. When the allocator destruct, it means the Cache
object referencing the allocator is destroyed, which means the cache is not used any more.
On the other hand, the allocator is not necessary only destroy on shutdown. Our API allow user to open a RocksDB instance then close it and then continue with other stuff. Leaking the memory usage would be a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment state the allocator should not be destruct while there's allocated memory in use.
cache/jemalloc_nodump_allocator.cc
Outdated
assert(success || original_alloc == expected); | ||
|
||
// Set the custom hook. | ||
std::unique_ptr<extent_hooks_t> new_hooks(new extent_hooks_t(*hooks)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks we expect there to be only 1 hooks globally? If so a static one (maybe guard this part along with the compare_exchange above) might be better, since the lifetime management for hooks can get tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have separate hooks for each allocator instance, but I didn't find a way to pass non-static member method to mallocx
as new hook. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I totally follow. Having a extent_hooks_t member in the allocator class should work? Again we will need to ensure the hooks_t is destructed after destroying the arena.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't explain the problem well. Since extent_hooks_t
is a typedef of function pointer, I can only use a static method (JemallocNodumpAllocator::Alloc
) as the hook. From the static method it is not able to access non-static members. That's why I'm forced to make original_alloc_
static. But if there's a way to make it non-static, I'll try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes I'm holding the extent_hooks_t
as non-static member in the class and it will get destruct after arena destruct, by destructor order. So it should work?
@yiwu-arbug has updated the pull request. Re-import the pull request |
@yiwu-arbug has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
24cb318
to
99f3860
Compare
@yiwu-arbug has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@anand1976 I'm landing this one. If you have any comment, I'll address them in subsequent patches. |
Summary:
Introduce
JemallocNodumpAllocator
, which allow exclusion of block cache usage from core dump. It utilize custom hook of jemalloc arena, and when jemalloc arena request memory from system, the allocator use the hook to setMADV_DONTDUMP
to the memory. The implementation is basically the same asfolly::JemallocNodumpAllocator
, except for some minor difference:arena.<i>.destroy
viamallctl
.Depending on #4502.