-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for secondary cache in LRUCache #8271
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the design is general, it seems to make the "simple" or "normal" case complicated.
Normally, we store a mix of "parsed" (e.g. ParsedFullFilterBlock or Block) and "no parsing necessary" (BlockContents) data structures in block cache.
I expect the "normal" case for tiered cache to be storing just the BlockContents, perhaps with recompression or something. Do we have plans for something more elaborate?
Did we consider an alternate design like this: instead of adding an "unparsing" layer to get block cache data to tiered cache, codify the parsing part of the volatile tier into the block cache itself, so that the tiered cache has direct access to the unparsed version (BlockContents)? It seems like this would involve exposing some CacheableEntry / BlocklikeTraits functionality into block cache (seems reasonable) and save the ugliness (and execution cost?) of coding up (and executing) an unparsing step.
cache/lru_cache.cc
Outdated
// mutex if we're going to lookup in the NVM cache | ||
// Only support synchronous for now | ||
// TODO: Support asynchronous lookup in NVM cache | ||
if (!e && tiered_cache_ && helper_cb && wait) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LRUCache is among the most performance-sensitive pieces of code we have. Have you evaluated the performance impact of these changes (e.g. cache_bench)? Do you see an efficient path to planned changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll run benchmarks to check for regression.
cache/lru_cache.h
Outdated
void Free() { | ||
assert(refs == 0); | ||
if (deleter) { | ||
(*deleter)(key(), value); | ||
if (!IsTieredCacheCompatible() && info_.deleter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems awkward wedging in this new capability into one implementation, rather than extending it to (perhaps) more naturally support the data.
Except perhaps for Cache API compatibility, which I would argue is not so critical. |
We store BlockContents only in the compressed block cache. The uncompressed block cache mostly has objects that require some parsing.
For the time being, yes. We may be able to do something more elaborate with PM, like a deep copy, so it'd be good to retain the flexibility.
Interesting. I haven't considered this design. I'm not sure the block cache would need to be exposed to the BlocklikeTraits functionality. The Insert API could just take a pointer to the raw data right, since the assumption is that no special parsing is needed? I'm not too concerned about the execution cost of "unparsing", since its just extracting the raw data pointer. The bigger overhead is memcpy'ing the data. OTOH, I think its too restrictive to assume that the tiered cache would only ever need access to the BlockContents/raw data. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it. In general it looks good for me. The title and description need to be updated to "Secondary cache". Since it is experimental, maybe we do not need to log the change to HISTORY.md for now?
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't dig into tests, but I'm now satisfied with the interface and indications of no notable performance regression.
// its not ready. The caller should then call Value() to check if the | ||
// item was successfully retrieved. If unsuccessful (perhaps due to an | ||
// IO error), Value() will return nullptr. | ||
virtual Handle* Lookup(const Slice& key, const CacheItemHelper* /*helper_cb*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the waiting functionality, consider (probably future PR) whether it's worth fixing the race condition between failed lookup and insert. Although our "redundant" block cache insertion statistics suggest it's not a problem currently, with more moves toward remote storage, we could run into more "thundering herd" type problems. My prototype is here: pdillinger@513affa#diff-e9d84127b0d5363ff842cce3fca54dfbffec5d9955f810e7678d69c5c5904e3fR227
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@anand1976 has updated the pull request. You must reimport the pull request before landing. |
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@anand1976 merged this pull request in feb06e8. |
…ache (#8315) Summary: Secondary cache is implemented to achieve the secondary cache tier for block cache. New Insert and Lookup APIs are introduced in #8271 . To support and use the secondary cache in block based table reader, this PR introduces the corresponding callback functions that will be used in secondary cache, and update the Insert and Lookup APIs accordingly. benchmarking: ./db_bench --benchmarks="fillrandom" -num=1000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/tmp/rocks_t/db -partition_index_and_filters=true ./db_bench -db=/tmp/rocks_t/db -use_existing_db=true -benchmarks=readrandom -num=1000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=5 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -stats_dump_period_sec=30 -reads=50000000 master benchmarking results: readrandom : 3.923 micros/op 254881 ops/sec; 33.4 MB/s (23849796 of 50000000 found) rocksdb.db.get.micros P50 : 2.820992 P95 : 5.636716 P99 : 16.450553 P100 : 8396.000000 COUNT : 50000000 SUM : 179947064 Current PR benchmarking results readrandom : 4.083 micros/op 244925 ops/sec; 32.1 MB/s (23849796 of 50000000 found) rocksdb.db.get.micros P50 : 2.967687 P95 : 5.754916 P99 : 15.665912 P100 : 8213.000000 COUNT : 50000000 SUM : 187250053 About 3.8% throughput reduction. P50: 5.2% increasing, P95, 2.09% increasing, P99 4.77% improvement Pull Request resolved: #8315 Test Plan: added the testing case Reviewed By: anand1976 Differential Revision: D28599774 Pulled By: zhichao-cao fbshipit-source-id: 098c4df0d7327d3a546df7604b2f1602f13044ed
Defined the abstract interface for a secondary cache in include/rocksdb/secondary_cache.h, and updated LRUCacheOptions to take a std::shared_ptr. An item is initially inserted into the LRU (primary) cache. When it ages out and evicted from memory, its inserted into the secondary cache. On a LRU cache miss and successful lookup in the secondary cache, the item is promoted to the LRU cache. Only support synchronous lookup currently. The secondary cache would be used to implement a persistent (flash cache) or compressed cache.
Tests:
Results from cache_bench and db_bench don't show any regression due to these changes.
cache_bench results before and after this change -
Command
./cache_bench -ops_per_thread=10000000 -threads=1
Before
Complete in 40.688 s; QPS = 245774
Complete in 40.486 s; QPS = 246996
Complete in 42.019 s; QPS = 237989
After
Complete in 40.672 s; QPS = 245869
Complete in 44.622 s; QPS = 224107
Complete in 42.445 s; QPS = 235599
db_bench results before this change, and with this change + #8213 and #8191 -
Commands
./db_bench --benchmarks="fillseq,compact" -num=30000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/home/anand76/nvm_cache/db -partition_index_and_filters=true
./db_bench -db=/home/anand76/nvm_cache/db -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=6 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -threads=16 -duration=300
Before
After