Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port folly::JemallocNodumpAllocator #4534

Closed
wants to merge 1 commit into from

Conversation

yiwu-arbug
Copy link
Contributor

@yiwu-arbug yiwu-arbug commented Oct 17, 2018

Summary:
Introduce JemallocNodumpAllocator, which allow exclusion of block cache usage from core dump. It utilize custom hook of jemalloc arena, and when jemalloc arena request memory from system, the allocator use the hook to set MADV_DONTDUMP to the memory. The implementation is basically the same as folly::JemallocNodumpAllocator, except for some minor difference:

  1. It only support jemalloc >= 5.0
  2. When the allocator destruct, it explicitly destruct the corresponding arena via arena.<i>.destroy via mallctl.

Depending on #4502.

@yiwu-arbug
Copy link
Contributor Author

cc @interwq

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

// Returns the memory size of the block allocated at p. The default
// implementation that just returns the original allocation_size is fine.
virtual size_t UsableSize(void* /*p*/, size_t allocation_size) const {
// default implementation just returns the allocation size
return allocation_size;
}
};

// Factory interface to obtain CacheAllocator instances.
class CacheAllocatorFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "cache allocator" is very confusing to me. Why not call it something like "block allocator" or something like that (because it allocates blocks)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igorcanadi introduce it in #4437. I don't think "block allocator" is more appropriate since we use it together with Cache, not BlockCache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocator is used inside BlockFetcher, not cache. The class's function has nothing to do with cache at all. It's just that contained by a cache and users can get it from the cache object. Also, "cache allocator" can be easily misunderstood to be an allocator of caches. Actually it allocates memory. I guess MemoryAllocator is a bad idea as we have another allocator interface inherited by arena. How about MemoryAllocatorWrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we only use it in BlockFetcher doesn't affect the allocator abstraction I guess. Just like the Cache didn't named BlockCache and we later reuse it as also as table cache. The allocator interface can be used later to allocate other memory. I don't see problem with CacheAllocator but I'm fine with MemoryAllocator. I don't think the class needs to be a wrapper though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an offline discussion with @lth, I think something close to MemoryAllocator might be more appropriate. For example, we might want to extend the no dump option to memtable, by reuse the same code and interface. @siying If MemoryAllocator is not a good idea, do you have any suggestions? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemoryAllocator feels great to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying I'll make that change in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'll make the name change in #4502.

virtual const char* Name() const = 0;

virtual Status NewCacheAllocator(
std::unique_ptr<CacheAllocator>* cache_allocator) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to generate new cache allocator? Another word, do we need this factory at all? Can we just plug in an allocator to allocate blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I come up with the factory class because I want to have ShardedCache handle allocator sharding. In a followup patch, I'll update ShardedCache to shard allocator by cache key. But yeah, if we want to shard in another way (round-robin?), or don't need sharding at all, the factory class is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your thoughts on the fragmentation in this per shard allocator setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @interwq, Jemalloc by default initiate 4 * cpu_cores arenas and assign static arena id to threads. Assume we run on server with 56 cores, that's 224 arenas. Compare with we have by default 64 arenas for block cache, arguably we should have less fragmentation.

For users run multiple DB instance each with dedicated block cache, they can (and should) use a shared CacheAllocator for those block caches, if they wish to get the no dump option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the fact that with this approach, we are separating allocation for normal c++ objects and block contents. I'm arguing that normal c++ object tend to be smaller than block size. By separating allocation for them, arguably we should see fragmentation reduce (instead of increase) since mixing allocation of different size class tend to increase fragmentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each allocator class contains configurable number of arenas, it sounds great to me, it is much better than the previous idea of arena per block cache shard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug I'm not sure about the assumption that other allocations are all smaller than block size. RocksDB is a library, we don't know how users allocate memory. Even looking at RocksDB, large allcoation outside blocks are just as common, just to name some: malloc for reading uncompressed blocks, exponential readahead in iterators, memtable arena, compaction writeable file buffer, table builder's buffering of key hashes, indexes and bloom filter being built, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I'll leave the consideration as inline comment in the header file. I don't think we will turn the no dump options on by default. As with any feature, user bother to do so should evaluate if they are willing to make the tradeoff. Of cause we'll be careful with the MyRocks use case.

@@ -16,14 +22,35 @@ class CacheAllocator {
// Name of the cache allocator, printed in the log
virtual const char* Name() const = 0;

// Allocate a block of at least size size
// Allocate a block of at least size. Has to be thread-safe.
virtual void* Allocate(size_t size) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason it is "void*" rather than "char*"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again it was introduced in #4437. But since even malloc return void*, I think it is more natural. It means it don't imply typing. Again I don't think the CacaheAllocator interface is specific to block cache.

@yiwu-arbug yiwu-arbug changed the title Introduce JemallocNodumpAllocator port folly::JemallocNodumpAllocator Oct 19, 2018

JemallocNodumpAllocator::~JemallocNodumpAllocator() {
assert(arena_index_ != 0);
std::string key = "arena." + ToString(arena_index_) + ".destroy";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this functionality? Destroying an arena will also release all the memory managed by the arena, which should be done carefully. If the allocator is only created once and never released, it's fine not destroying it at shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, it is pretty safe to release all memory at this point. When the allocator destruct, it means the Cache object referencing the allocator is destroyed, which means the cache is not used any more.

On the other hand, the allocator is not necessary only destroy on shutdown. Our API allow user to open a RocksDB instance then close it and then continue with other stuff. Leaking the memory usage would be a bad idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment state the allocator should not be destruct while there's allocated memory in use.

assert(success || original_alloc == expected);

// Set the custom hook.
std::unique_ptr<extent_hooks_t> new_hooks(new extent_hooks_t(*hooks));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks we expect there to be only 1 hooks globally? If so a static one (maybe guard this part along with the compare_exchange above) might be better, since the lifetime management for hooks can get tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have separate hooks for each allocator instance, but I didn't find a way to pass non-static member method to mallocx as new hook. Any suggestion?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I totally follow. Having a extent_hooks_t member in the allocator class should work? Again we will need to ensure the hooks_t is destructed after destroying the arena.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't explain the problem well. Since extent_hooks_t is a typedef of function pointer, I can only use a static method (JemallocNodumpAllocator::Alloc) as the hook. From the static method it is not able to access non-static members. That's why I'm forced to make original_alloc_ static. But if there's a way to make it non-static, I'll try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes I'm holding the extent_hooks_t as non-static member in the class and it will get destruct after arena destruct, by destructor order. So it should work?

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link

@interwq interwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@yiwu-arbug
Copy link
Contributor Author

@anand1976 I'm landing this one. If you have any comment, I'll address them in subsequent patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants