Provide an allocator for new memory type to be used with RocksDB block cache #6214

lucagiac81 · 2019-12-19T01:05:22Z

New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.

Performance

We tested the new allocator using db_bench.

For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
We use a uniform distribution as a worst-case scenario.

The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.

Changes

Add MemkindKmemAllocator
Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
Add detection of memkind library with KMEM DAX support
Add test for MemkindKmemAllocator

Minimum Requirements

kernel 5.3.12
ndctl v67 - https://github.com/pmem/ndctl
memkind v1.10.0 - https://github.com/memkind/memkind

Memory Configuration

The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on memkind’s GitHub page to set up NVDIMM memory accordingly.

Note on memory allocation with NVDIMM memory exposed as system memory.

The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.

Usage

When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):

#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"

NewLRUCache(
    capacity /*size_t*/, 
    6 /*cache_numshardbits*/,
    false /*strict_capacity_limit*/, 
    false /*cache_high_pri_pool_ratio*/,
    std::make_shared<MemkindKmemAllocator>());

Refer to RocksDB’s block cache documentation to assign the LRU cache as block cache for a database.

siying

Thank you for your contribution. I have some comments.

siying · 2019-12-19T01:38:51Z

memory/memkind_kmem_allocator.cc

+*
+* SPDX-License-Identifier: Apache-2.0
+*
+*/


RocksDB supports GPLv2 + Apache v2 dual license. Having some source code with only one of them confuses people and will make adaption more complicated for potential users. I would be reluctant to accept source file like this. Ideally, you give the copyright to RocksDB too. I don't think these source files contain any IP worth protecting so hopefully it's easy for you to do it.

Headers are updated to include dual license and Facebook copyright.

siying · 2019-12-19T01:39:29Z

memory/memkind_kmem_allocator_test.cc

+  size_t size = allocator.UsableSize(p, 1024);
+  ASSERT_GE(size, 1024);
+  allocator.Deallocate(p);
+}


It will be interesting to add a test of full DB with block cache to allocate from this allocator.

I added a full DB test using the new allocator. The database is populated and a flush is triggered, so the block cache is used for subsequent reads.

facebook-github-bot

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

lucagiac81 · 2020-01-29T15:55:39Z

A release of the memkind library with KMEM DAX support is available (v1.10.0). The PR description is updated to reference the release rather than a development commit.

- Add MemkindKmemAllocator - Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator) - Add detection of memkind library with KMEM DAX support - Add tests for MemkindKmemAllocator

facebook-github-bot · 2020-04-09T21:22:06Z

@lucagiac81 has updated the pull request. Re-import the pull request

facebook-github-bot

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-04-10T04:46:39Z

This pull request has been merged in 66a95f0.

dhruba · 2020-04-10T16:50:24Z

@lucagiac81 great patch, this will enable users to keep the block cache in persistent RAM. I have two questions:

The attached graph shows that when the dataset-entirely-cached-in-persistent-RAM-block-cache, then the speedup is 2.6 times as compared to when nothing is cached in the block cache. This is great. If you have any measurements on comparing the dataset-entirely-cached-in-persistent-RAM-block-cace vs dataset-entirely-cached-in-volatile-RAM-block-cache, please do share it with us.
In the graph that you shared (dataset-entirely-cached-in-persistent-RAM-block-cache), is the system bottlenecked on 100% cpu usage?

Summary: Two recent diffs can be autoformatted. Also add HISTORY.md entry for facebook#6214 Test Plan: Run all existing tests

Summary: Two recent diffs can be autoformatted. Also add HISTORY.md entry for #6214 Pull Request resolved: #6685 Test Plan: Run all existing tests Reviewed By: cheng-chang Differential Revision: D20965780 fbshipit-source-id: 195b08d7849513d42fe14073112cd19fdda6af95

lucagiac81 · 2020-04-13T18:50:22Z

@dhruba thanks! About your questions:

With the dataset size I used (4 TB), I cannot test the same scenario on DRAM, as I cannot have a cache that large.
CPU is not the bottleneck yet. Due to memory allocation overhead, the dataset doesn’t completely fit in the block cache, and the hit rate is <100%. If I increase the cache size (and hit rate) further, I can drive CPU utilization and throughput higher within the same latency SLA. Minimizing the overhead is key here. Once I have a BKM for that, I will share more details.

I’d also like to clarify that the persistent nature of the memory is not used here. It is used in volatile mode.

Summary: Improve memkind library detection in build_detect_platform: - The current position of -lmemkind does not work with all versions of gcc - LDFLAGS allows specifying non-standard library path through EXTRA_LDFLAGS After the change, the options match TBB detection. This is a follow-up to #6214. Pull Request resolved: #9134 Reviewed By: ajkr, mrambacher Differential Revision: D32192028 fbshipit-source-id: 115fafe8d93f1fe6aaf80afb32b2cb67aad074c7

facebook-github-bot added the CLA Signed label Dec 19, 2019

siying reviewed Dec 19, 2019

View reviewed changes

facebook-github-bot reviewed Jan 6, 2020

View reviewed changes

lucagiac81 force-pushed the memkind_kmem_allocator branch from b4f301f to 5c3bffb Compare April 9, 2020 21:21

facebook-github-bot reviewed Apr 9, 2020

View reviewed changes

facebook-github-bot closed this in 66a95f0 Apr 10, 2020

facebook-github-bot added the Merged label Apr 10, 2020

siying added a commit to siying/rocksdb that referenced this pull request Apr 10, 2020

Auto-Format two recent diffs and add HISTORY.md

18c273e

Summary: Two recent diffs can be autoformatted. Also add HISTORY.md entry for facebook#6214 Test Plan: Run all existing tests

siying mentioned this pull request Apr 10, 2020

Auto-Format two recent diffs and add HISTORY.md #6685

Closed

lucagiac81 mentioned this pull request Nov 4, 2021

Improve memkind library detection #9134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an allocator for new memory type to be used with RocksDB block cache #6214

Provide an allocator for new memory type to be used with RocksDB block cache #6214

lucagiac81 commented Dec 19, 2019 •

edited

Loading

siying left a comment

siying Dec 19, 2019

lucagiac81 Jan 6, 2020

siying Dec 19, 2019

lucagiac81 Jan 6, 2020

facebook-github-bot left a comment

lucagiac81 commented Jan 29, 2020

facebook-github-bot commented Apr 9, 2020

facebook-github-bot left a comment

facebook-github-bot commented Apr 10, 2020

dhruba commented Apr 10, 2020 •

edited

Loading

lucagiac81 commented Apr 13, 2020

Provide an allocator for new memory type to be used with RocksDB block cache #6214

Provide an allocator for new memory type to be used with RocksDB block cache #6214

Conversation

lucagiac81 commented Dec 19, 2019 • edited Loading

siying left a comment

Choose a reason for hiding this comment

siying Dec 19, 2019

Choose a reason for hiding this comment

lucagiac81 Jan 6, 2020

Choose a reason for hiding this comment

siying Dec 19, 2019

Choose a reason for hiding this comment

lucagiac81 Jan 6, 2020

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

lucagiac81 commented Jan 29, 2020

facebook-github-bot commented Apr 9, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 10, 2020

dhruba commented Apr 10, 2020 • edited Loading

lucagiac81 commented Apr 13, 2020

lucagiac81 commented Dec 19, 2019 •

edited

Loading

dhruba commented Apr 10, 2020 •

edited

Loading