Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an allocator for new memory type to be used with RocksDB block cache #6214

Closed

Conversation

lucagiac81
Copy link
Contributor

@lucagiac81 lucagiac81 commented Dec 19, 2019

New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.

Performance

We tested the new allocator using db_bench.

  • For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
  • The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
  • We use a uniform distribution as a worst-case scenario.

The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.

image

Changes

  • Add MemkindKmemAllocator
  • Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
  • Add detection of memkind library with KMEM DAX support
  • Add test for MemkindKmemAllocator

Minimum Requirements

Memory Configuration

The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on memkind’s GitHub page to set up NVDIMM memory accordingly.

Note on memory allocation with NVDIMM memory exposed as system memory.

  • The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
  • The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.

Usage

When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):

#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"

NewLRUCache(
    capacity /*size_t*/, 
    6 /*cache_numshardbits*/,
    false /*strict_capacity_limit*/, 
    false /*cache_high_pri_pool_ratio*/,
    std::make_shared<MemkindKmemAllocator>());

Refer to RocksDB’s block cache documentation to assign the LRU cache as block cache for a database.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. I have some comments.

*
* SPDX-License-Identifier: Apache-2.0
*
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RocksDB supports GPLv2 + Apache v2 dual license. Having some source code with only one of them confuses people and will make adaption more complicated for potential users. I would be reluctant to accept source file like this. Ideally, you give the copyright to RocksDB too. I don't think these source files contain any IP worth protecting so hopefully it's easy for you to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers are updated to include dual license and Facebook copyright.

size_t size = allocator.UsableSize(p, 1024);
ASSERT_GE(size, 1024);
allocator.Deallocate(p);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be interesting to add a test of full DB with block cache to allocate from this allocator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a full DB test using the new allocator. The database is populated and a flush is triggered, so the block cache is used for subsequent reads.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@lucagiac81
Copy link
Contributor Author

A release of the memkind library with KMEM DAX support is available (v1.10.0). The PR description is updated to reference the release rather than a development commit.

- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an
LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add tests for MemkindKmemAllocator
@facebook-github-bot
Copy link
Contributor

@lucagiac81 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 66a95f0.

@dhruba
Copy link
Contributor

dhruba commented Apr 10, 2020

@lucagiac81 great patch, this will enable users to keep the block cache in persistent RAM. I have two questions:

  1. The attached graph shows that when the dataset-entirely-cached-in-persistent-RAM-block-cache, then the speedup is 2.6 times as compared to when nothing is cached in the block cache. This is great. If you have any measurements on comparing the dataset-entirely-cached-in-persistent-RAM-block-cace vs dataset-entirely-cached-in-volatile-RAM-block-cache, please do share it with us.

  2. In the graph that you shared (dataset-entirely-cached-in-persistent-RAM-block-cache), is the system bottlenecked on 100% cpu usage?

siying added a commit to siying/rocksdb that referenced this pull request Apr 10, 2020
Summary:
Two recent diffs can be autoformatted.
Also add HISTORY.md entry for facebook#6214

Test Plan: Run all existing tests
facebook-github-bot pushed a commit that referenced this pull request Apr 10, 2020
Summary:
Two recent diffs can be autoformatted.
Also add HISTORY.md entry for #6214
Pull Request resolved: #6685

Test Plan: Run all existing tests

Reviewed By: cheng-chang

Differential Revision: D20965780

fbshipit-source-id: 195b08d7849513d42fe14073112cd19fdda6af95
@lucagiac81
Copy link
Contributor Author

@dhruba thanks! About your questions:

  1. With the dataset size I used (4 TB), I cannot test the same scenario on DRAM, as I cannot have a cache that large.

  2. CPU is not the bottleneck yet. Due to memory allocation overhead, the dataset doesn’t completely fit in the block cache, and the hit rate is <100%. If I increase the cache size (and hit rate) further, I can drive CPU utilization and throughput higher within the same latency SLA. Minimizing the overhead is key here. Once I have a BKM for that, I will share more details.

I’d also like to clarify that the persistent nature of the memory is not used here. It is used in volatile mode.

facebook-github-bot pushed a commit that referenced this pull request May 9, 2022
Summary:
Improve memkind library detection in build_detect_platform:

- The current position of -lmemkind does not work with all versions of gcc
- LDFLAGS allows specifying non-standard library path through EXTRA_LDFLAGS

After the change, the options match TBB detection.
This is a follow-up to #6214.

Pull Request resolved: #9134

Reviewed By: ajkr, mrambacher

Differential Revision: D32192028

fbshipit-source-id: 115fafe8d93f1fe6aaf80afb32b2cb67aad074c7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants