Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the uncompression dictionary object out of the block cache #5584

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,7 @@ set(SOURCES
table/block_based/full_filter_block.cc
table/block_based/index_builder.cc
table/block_based/partitioned_filter_block.cc
table/block_based/uncompression_dict_reader.cc
table/block_fetcher.cc
table/bloom_block.cc
table/cuckoo/cuckoo_table_builder.cc
Expand Down
7 changes: 5 additions & 2 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@

### Public API Change
* Now DB::Close() will return Aborted() error when there is unreleased snapshot. Users can retry after all snapshots are released.
* Index and filter blocks are now handled similarly to data blocks with regards to the block cache: instead of storing reader objects in the cache, only the blocks themselves are cached. In addition, index and filter blocks (as well as filter partitions) no longer get evicted from the cache when a table is closed. Moreover, index blocks can now use the compressed block cache (if any).
* Index, filter, and compression dictionary blocks are now handled similarly to data blocks with regards to the block cache: instead of storing objects in the cache, only the blocks themselves are cached. In addition, index, filter, and compression dictionary blocks (as well as filter partitions) no longer get evicted from the cache when a table is closed. Moreover, index blocks can now use the compressed block cache (if any), and cached index blocks can be shared among multiple table readers.
* Partitions of partitioned indexes no longer affect the read amplification statistics.
* Due to the above refactoring, block cache eviction statistics for indexes and filters are temporarily broken. We plan to reintroduce them in a later phase.
* Due to the above refactoring, block cache eviction statistics for indexes, filters, and compression dictionaries are temporarily broken. We plan to reintroduce them in a later phase.
* Errors related to the retrieval of the compression dictionary are now propagated to the user.
* options.keep_log_file_num will be enforced strictly all the time. File names of all log files will be tracked, which may take significantly amount of memory if options.keep_log_file_num is large and either of options.max_log_file_size or options.log_file_time_to_roll is set.
* Add initial support for Get/Put with user timestamps. Users can specify timestamps via ReadOptions and WriteOptions when calling DB::Get and DB::Put.
* Accessing a partition of a partitioned filter or index through a pinned reference is no longer considered a cache hit.
Expand All @@ -25,6 +26,7 @@
* Allow DBImplSecondary to remove memtables with obsolete data after replaying MANIFEST and WAL.
* Add an option `failed_move_fall_back_to_copy` (default is true) for external SST ingestion. When `move_files` is true and hard link fails, ingestion falls back to copy if `failed_move_fall_back_to_copy` is true. Otherwise, ingestion reports an error.
* Add argument `--secondary_path` to ldb to open the database as the secondary instance. This would keep the original DB intact.
* Compression dictionary blocks are now prefetched and pinned in the cache (based on the customer's settings) the same way as index and filter blocks.

### Performance Improvements
* Reduce binary search when iterator reseek into the same data block.
Expand All @@ -34,6 +36,7 @@
* Log Writer will flush after finishing the whole record, rather than a fragment.
* Lower MultiGet batching API latency by reading data blocks from disk in parallel
* Improve performance of row_cache: make reads with newer snapshots than data in an SST file share the same cache key, except in some transaction cases.
* The compression dictionary is no longer copied to a new object upon retrieval.

### General Improvements
* Added new status code kColumnFamilyDropped to distinguish between Column Family Dropped and DB Shutdown in progress.
Expand Down
1 change: 1 addition & 0 deletions TARGETS
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@ cpp_library(
"table/block_based/full_filter_block.cc",
"table/block_based/index_builder.cc",
"table/block_based/partitioned_filter_block.cc",
"table/block_based/uncompression_dict_reader.cc",
"table/block_fetcher.cc",
"table/bloom_block.cc",
"table/cuckoo/cuckoo_table_builder.cc",
Expand Down
67 changes: 51 additions & 16 deletions db/db_block_cache_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ class DBBlockCacheTest : public DBTestBase {
size_t hit_count_ = 0;
size_t insert_count_ = 0;
size_t failure_count_ = 0;
size_t compression_dict_miss_count_ = 0;
size_t compression_dict_hit_count_ = 0;
size_t compression_dict_insert_count_ = 0;
size_t compressed_miss_count_ = 0;
size_t compressed_hit_count_ = 0;
size_t compressed_insert_count_ = 0;
Expand Down Expand Up @@ -69,6 +72,15 @@ class DBBlockCacheTest : public DBTestBase {
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSED_ADD_FAILURES);
}

void RecordCacheCountersForCompressionDict(const Options& options) {
compression_dict_miss_count_ =
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_MISS);
compression_dict_hit_count_ =
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_HIT);
compression_dict_insert_count_ =
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_ADD);
}

void CheckCacheCounters(const Options& options, size_t expected_misses,
size_t expected_hits, size_t expected_inserts,
size_t expected_failures) {
Expand All @@ -87,6 +99,28 @@ class DBBlockCacheTest : public DBTestBase {
failure_count_ = new_failure_count;
}

void CheckCacheCountersForCompressionDict(
const Options& options, size_t expected_compression_dict_misses,
size_t expected_compression_dict_hits,
size_t expected_compression_dict_inserts) {
size_t new_compression_dict_miss_count =
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_MISS);
size_t new_compression_dict_hit_count =
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_HIT);
size_t new_compression_dict_insert_count =
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_ADD);
ASSERT_EQ(compression_dict_miss_count_ + expected_compression_dict_misses,
new_compression_dict_miss_count);
ASSERT_EQ(compression_dict_hit_count_ + expected_compression_dict_hits,
new_compression_dict_hit_count);
ASSERT_EQ(
compression_dict_insert_count_ + expected_compression_dict_inserts,
new_compression_dict_insert_count);
compression_dict_miss_count_ = new_compression_dict_miss_count;
compression_dict_hit_count_ = new_compression_dict_hit_count;
compression_dict_insert_count_ = new_compression_dict_insert_count;
}

void CheckCompressedCacheCounters(const Options& options,
size_t expected_misses,
size_t expected_hits,
Expand Down Expand Up @@ -671,6 +705,8 @@ TEST_F(DBBlockCacheTest, CacheCompressionDict) {
options.table_factory.reset(new BlockBasedTableFactory(table_options));
DestroyAndReopen(options);

RecordCacheCountersForCompressionDict(options);

for (int i = 0; i < kNumFiles; ++i) {
ASSERT_EQ(i, NumTableFilesAtLevel(0, 0));
for (int j = 0; j < kNumEntriesPerFile; ++j) {
Expand All @@ -683,27 +719,26 @@ TEST_F(DBBlockCacheTest, CacheCompressionDict) {
ASSERT_EQ(0, NumTableFilesAtLevel(0));
ASSERT_EQ(kNumFiles, NumTableFilesAtLevel(1));

// Compression dictionary blocks are preloaded.
CheckCacheCountersForCompressionDict(
options, kNumFiles /* expected_compression_dict_misses */,
0 /* expected_compression_dict_hits */,
kNumFiles /* expected_compression_dict_inserts */);

// Seek to a key in a file. It should cause the SST's dictionary meta-block
// to be read.
RecordCacheCounters(options);
ASSERT_EQ(0,
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_MISS));
ASSERT_EQ(0, TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_ADD));
ASSERT_EQ(
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_BYTES_INSERT),
0);
RecordCacheCountersForCompressionDict(options);
ReadOptions read_options;
ASSERT_NE("NOT_FOUND", Get(Key(kNumFiles * kNumEntriesPerFile - 1)));
// Two blocks missed/added: dictionary and data block
// One block hit: index since it's prefetched
CheckCacheCounters(options, 2 /* expected_misses */, 1 /* expected_hits */,
2 /* expected_inserts */, 0 /* expected_failures */);
ASSERT_EQ(1,
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_MISS));
ASSERT_EQ(1, TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_ADD));
ASSERT_GT(
TestGetTickerCount(options, BLOCK_CACHE_COMPRESSION_DICT_BYTES_INSERT),
0);
// Two block hits: index and dictionary since they are prefetched
// One block missed/added: data block
CheckCacheCounters(options, 1 /* expected_misses */, 2 /* expected_hits */,
1 /* expected_inserts */, 0 /* expected_failures */);
CheckCacheCountersForCompressionDict(
options, 0 /* expected_compression_dict_misses */,
1 /* expected_compression_dict_hits */,
0 /* expected_compression_dict_inserts */);
}
}

Expand Down
6 changes: 0 additions & 6 deletions db/version_set.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3420,16 +3420,10 @@ VersionSet::VersionSet(const std::string& dbname,
env_options_(storage_options),
block_cache_tracer_(block_cache_tracer) {}

void CloseTables(void* ptr, size_t) {
TableReader* table_reader = reinterpret_cast<TableReader*>(ptr);
table_reader->Close();
}

VersionSet::~VersionSet() {
// we need to delete column_family_set_ because its destructor depends on
// VersionSet
Cache* table_cache = column_family_set_->get_table_cache();
table_cache->ApplyToAllCacheEntries(&CloseTables, false /* thread_safe */);
column_family_set_.reset();
for (auto& file : obsolete_files_) {
if (file.metadata->table_reader_handle) {
Expand Down
5 changes: 0 additions & 5 deletions include/rocksdb/cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -250,11 +250,6 @@ class Cache {

virtual std::string GetPrintableOptions() const { return ""; }

// Mark the last inserted object as being a raw data block. This will be used
// in tests. The default implementation does nothing.
virtual void TEST_mark_as_data_block(const Slice& /*key*/,
size_t /*charge*/) {}

MemoryAllocator* memory_allocator() const { return memory_allocator_.get(); }

private:
Expand Down
1 change: 1 addition & 0 deletions src.mk
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ LIB_SOURCES = \
table/block_based/full_filter_block.cc \
table/block_based/index_builder.cc \
table/block_based/partitioned_filter_block.cc \
table/block_based/uncompression_dict_reader.cc \
table/block_fetcher.cc \
table/bloom_block.cc \
table/cuckoo/cuckoo_table_builder.cc \
Expand Down
7 changes: 1 addition & 6 deletions table/block_based/block_based_filter_block_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,7 @@ class TestHashFilter : public FilterPolicy {
class MockBlockBasedTable : public BlockBasedTable {
public:
explicit MockBlockBasedTable(Rep* rep)
: BlockBasedTable(rep, nullptr /* block_cache_tracer */) {
// Initialize what Open normally does as much as necessary for the test
rep->cache_key_prefix_size = 10;
}
: BlockBasedTable(rep, nullptr /* block_cache_tracer */) {}
};

class FilterBlockTest : public testing::Test {
Expand All @@ -64,7 +61,6 @@ class FilterBlockTest : public testing::Test {
: ioptions_(options_),
env_options_(options_),
icomp_(options_.comparator) {
table_options_.no_block_cache = true;
table_options_.filter_policy.reset(new TestHashFilter);

constexpr bool skip_filters = false;
Expand Down Expand Up @@ -271,7 +267,6 @@ class BlockBasedFilterBlockTest : public testing::Test {
: ioptions_(options_),
env_options_(options_),
icomp_(options_.comparator) {
table_options_.no_block_cache = true;
table_options_.filter_policy.reset(NewBloomFilterPolicy(10));

constexpr bool skip_filters = false;
Expand Down
Loading