Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for secondary cache in LRUCache #8271

Closed
wants to merge 9 commits into from

Conversation

anand1976
Copy link
Contributor

@anand1976 anand1976 commented May 5, 2021

Defined the abstract interface for a secondary cache in include/rocksdb/secondary_cache.h, and updated LRUCacheOptions to take a std::shared_ptr. An item is initially inserted into the LRU (primary) cache. When it ages out and evicted from memory, its inserted into the secondary cache. On a LRU cache miss and successful lookup in the secondary cache, the item is promoted to the LRU cache. Only support synchronous lookup currently. The secondary cache would be used to implement a persistent (flash cache) or compressed cache.

Tests:
Results from cache_bench and db_bench don't show any regression due to these changes.

cache_bench results before and after this change -
Command
./cache_bench -ops_per_thread=10000000 -threads=1
Before
Complete in 40.688 s; QPS = 245774
Complete in 40.486 s; QPS = 246996
Complete in 42.019 s; QPS = 237989
After
Complete in 40.672 s; QPS = 245869
Complete in 44.622 s; QPS = 224107
Complete in 42.445 s; QPS = 235599

db_bench results before this change, and with this change + #8213 and #8191 -
Commands
./db_bench --benchmarks="fillseq,compact" -num=30000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/home/anand76/nvm_cache/db -partition_index_and_filters=true

./db_bench -db=/home/anand76/nvm_cache/db -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=6 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -threads=16 -duration=300
Before

DB path: [/home/anand76/nvm_cache/db]                                                                                                                                        
readrandom   :      80.702 micros/op 198104 ops/sec;   54.4 MB/s (3708999 of 3708999 found)
DB path: [/home/anand76/nvm_cache/db]                                                                                                                                        
readrandom   :      87.124 micros/op 183625 ops/sec;   50.4 MB/s (3439999 of 3439999 found) 

After

DB path: [/home/anand76/nvm_cache/db]                                                                                                                                        
readrandom   :      77.653 micros/op 206025 ops/sec;   56.6 MB/s (3866999 of 3866999 found)
DB path: [/home/anand76/nvm_cache/db]                                                                                                                                        
readrandom   :      84.962 micros/op 188299 ops/sec;   51.7 MB/s (3535999 of 3535999 found)

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the design is general, it seems to make the "simple" or "normal" case complicated.

Normally, we store a mix of "parsed" (e.g. ParsedFullFilterBlock or Block) and "no parsing necessary" (BlockContents) data structures in block cache.

I expect the "normal" case for tiered cache to be storing just the BlockContents, perhaps with recompression or something. Do we have plans for something more elaborate?

Did we consider an alternate design like this: instead of adding an "unparsing" layer to get block cache data to tiered cache, codify the parsing part of the volatile tier into the block cache itself, so that the tiered cache has direct access to the unparsed version (BlockContents)? It seems like this would involve exposing some CacheableEntry / BlocklikeTraits functionality into block cache (seems reasonable) and save the ugliness (and execution cost?) of coding up (and executing) an unparsing step.

include/rocksdb/cache.h Outdated Show resolved Hide resolved
include/rocksdb/cache.h Outdated Show resolved Hide resolved
include/rocksdb/cache.h Outdated Show resolved Hide resolved
// mutex if we're going to lookup in the NVM cache
// Only support synchronous for now
// TODO: Support asynchronous lookup in NVM cache
if (!e && tiered_cache_ && helper_cb && wait) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRUCache is among the most performance-sensitive pieces of code we have. Have you evaluated the performance impact of these changes (e.g. cache_bench)? Do you see an efficient path to planned changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll run benchmarks to check for regression.

include/rocksdb/tiered_cache.h Outdated Show resolved Hide resolved
cache/clock_cache.cc Show resolved Hide resolved
cache/clock_cache.cc Show resolved Hide resolved
cache/lru_cache.cc Show resolved Hide resolved
cache/lru_cache.h Outdated Show resolved Hide resolved
void Free() {
assert(refs == 0);
if (deleter) {
(*deleter)(key(), value);
if (!IsTieredCacheCompatible() && info_.deleter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems awkward wedging in this new capability into one implementation, rather than extending it to (perhaps) more naturally support the data.

@pdillinger
Copy link
Contributor

It seems like this would involve exposing some CacheableEntry / BlocklikeTraits functionality into block cache (seems reasonable) ...

Except perhaps for Cache API compatibility, which I would argue is not so critical.

@anand1976
Copy link
Contributor Author

Although the design is general, it seems to make the "simple" or "normal" case complicated.

Normally, we store a mix of "parsed" (e.g. ParsedFullFilterBlock or Block) and "no parsing necessary" (BlockContents) data structures in block cache.

We store BlockContents only in the compressed block cache. The uncompressed block cache mostly has objects that require some parsing.

I expect the "normal" case for tiered cache to be storing just the BlockContents, perhaps with recompression or something. Do we have plans for something more elaborate?

For the time being, yes. We may be able to do something more elaborate with PM, like a deep copy, so it'd be good to retain the flexibility.

Did we consider an alternate design like this: instead of adding an "unparsing" layer to get block cache data to tiered cache, codify the parsing part of the volatile tier into the block cache itself, so that the tiered cache has direct access to the unparsed version (BlockContents)? It seems like this would involve exposing some CacheableEntry / BlocklikeTraits functionality into block cache (seems reasonable) and save the ugliness (and execution cost?) of coding up (and executing) an unparsing step.

Interesting. I haven't considered this design. I'm not sure the block cache would need to be exposed to the BlocklikeTraits functionality. The Insert API could just take a pointer to the raw data right, since the assumption is that no special parsing is needed?

I'm not too concerned about the execution cost of "unparsing", since its just extracting the raw data pointer. The bigger overhead is memcpy'ing the data. OTOH, I think its too restrictive to assume that the tiered cache would only ever need access to the BlockContents/raw data.

@anand1976 anand1976 requested a review from pdillinger May 11, 2021 00:31
@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@zhichao-cao zhichao-cao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it. In general it looks good for me. The title and description need to be updated to "Secondary cache". Since it is experimental, maybe we do not need to log the change to HISTORY.md for now?

@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@zhichao-cao zhichao-cao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@anand1976 anand1976 changed the title Initial support for tiered cache in LRUCache Initial support for secondary cache in LRUCache May 12, 2021
@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't dig into tests, but I'm now satisfied with the interface and indications of no notable performance regression.

include/rocksdb/cache.h Show resolved Hide resolved
include/rocksdb/cache.h Outdated Show resolved Hide resolved
// its not ready. The caller should then call Value() to check if the
// item was successfully retrieved. If unsuccessful (perhaps due to an
// IO error), Value() will return nullptr.
virtual Handle* Lookup(const Slice& key, const CacheItemHelper* /*helper_cb*/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the waiting functionality, consider (probably future PR) whether it's worth fixing the race condition between failed lookup and insert. Although our "redundant" block cache insertion statistics suggest it's not a problem currently, with more moves toward remote storage, we could run into more "thundering herd" type problems. My prototype is here: pdillinger@513affa#diff-e9d84127b0d5363ff842cce3fca54dfbffec5d9955f810e7678d69c5c5904e3fR227

include/rocksdb/cache.h Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 merged this pull request in feb06e8.

facebook-github-bot pushed a commit that referenced this pull request May 22, 2021
…ache (#8315)

Summary:
Secondary cache is implemented to achieve the secondary cache tier for block cache. New Insert and Lookup APIs are introduced in #8271  . To support and use the secondary cache in block based table reader, this PR introduces the corresponding callback functions that will be used in secondary cache, and update the Insert and Lookup APIs accordingly.

benchmarking:
./db_bench --benchmarks="fillrandom" -num=1000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/tmp/rocks_t/db -partition_index_and_filters=true

./db_bench -db=/tmp/rocks_t/db -use_existing_db=true -benchmarks=readrandom -num=1000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=5 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -stats_dump_period_sec=30 -reads=50000000

master benchmarking results:
readrandom   :       3.923 micros/op 254881 ops/sec;   33.4 MB/s (23849796 of 50000000 found)
rocksdb.db.get.micros P50 : 2.820992 P95 : 5.636716 P99 : 16.450553 P100 : 8396.000000 COUNT : 50000000 SUM : 179947064

Current PR benchmarking results
readrandom   :       4.083 micros/op 244925 ops/sec;   32.1 MB/s (23849796 of 50000000 found)
rocksdb.db.get.micros P50 : 2.967687 P95 : 5.754916 P99 : 15.665912 P100 : 8213.000000 COUNT : 50000000 SUM : 187250053

About 3.8% throughput reduction.
P50: 5.2% increasing, P95, 2.09% increasing, P99 4.77% improvement

Pull Request resolved: #8315

Test Plan: added the testing case

Reviewed By: anand1976

Differential Revision: D28599774

Pulled By: zhichao-cao

fbshipit-source-id: 098c4df0d7327d3a546df7604b2f1602f13044ed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants