Create a BlockCacheLookupContext to enable fine-grained block cache tracing. #5421

HaoyuHuang · 2019-06-06T21:53:17Z

BlockCacheLookupContext only contains the caller for now.
We will trace block accesses at five places:

BlockBasedTable::GetFilter.
BlockBasedTable::GetUncompressedDict.
BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.)
BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.)

We create the context at:

BlockBasedTable::Get. (kUserGet)
BlockBasedTable::MultiGet. (kUserMGet)
BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.)
BlockBasedTable::Open. (kPrefetch)
Index/Filter::CacheDependencies. (kPrefetch)
BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize).

I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable.
Throughput of this PR: 231334 ops/s.
Throughput of the master branch: 238428 ops/s.

Experiment setup:
RocksDB: version 6.2
Date: Mon Jun 10 10:42:51 2019
CPU: 24 * Intel Core Processor (Skylake)
CPUCache: 16384 KB
Keys: 20 bytes each
Values: 100 bytes each (100 bytes after compression)
Entries: 1000000
Prefix: 20 bytes
Keys per prefix: 0
RawSize: 114.4 MB (estimated)
FileSize: 114.4 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1

Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000

Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120

TODOs:

Create a caller for external SST file ingestion and differentiate the callers for iterator.
Integrate tracer to trace block cache accesses.

…racing.

facebook-github-bot

@HaoyuHuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ltamasi

One general comment regarding BlockBasedTable: the patch in general adds the BlockCacheLookupContext* parameter as the first parameter of methods. Since this tracing is auxiliary functionality, I feel a better place would be closer to the end of the parameter list, e.g. after GetContext* where applicable, similarly to how it's done in the FilterBlockReader classes.

riversand963

Taken a first pass. Can we remove the usage of default value of function arguments?

db/version_set.h

…_flush Summary: Pull Request resolved: facebook#5422 Differential Revision: D15706212 Pulled By: ltamasi fbshipit-source-id: 0acf060fb8568efee51c033e50b492bcf1095a4c

facebook-github-bot · 2019-06-07T00:01:36Z

@HaoyuHuang has updated the pull request. Re-import the pull request

facebook-github-bot · 2019-06-07T00:16:32Z

@HaoyuHuang has updated the pull request. Re-import the pull request

HaoyuHuang · 2019-06-07T00:19:11Z

I moved BlockCacheLookupContext to be after GetContext in functions where there is a GetContext. Otherwise, I moved it to be the last function argument.
I removed the default values of functions arguments for the functions that this PR changes.

facebook-github-bot

@HaoyuHuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Special characters like slashes and parentheses are not supported. Pull Request resolved: facebook#5424 Differential Revision: D15708067 Pulled By: ltamasi fbshipit-source-id: 90527ec3ee882a0cdd1249c3946f5eff2ff7c115

riversand963

Another pass with a few comments.

table/block_based/block_based_filter_block_test.cc

table/block_based/partitioned_filter_block.h

table/block_based/block_based_table_reader.cc

facebook-github-bot · 2019-06-07T17:55:00Z

@HaoyuHuang has updated the pull request. Re-import the pull request

…bleIterator (facebook#5428) Summary: PR facebook#5111 reduced the number of key comparisons when iterating with upper/lower bounds; however, this caused a regression for MyRocks. Reverting to the previous behavior in BlockBasedTableIterator as a hotfix. Pull Request resolved: facebook#5428 Differential Revision: D15721038 Pulled By: ltamasi fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c

…error (facebook#5412) Summary: I'm not able to prove it, but the stress test failure may be caused by the following sequence of events - 1. Crash db_stress while writing the log file. This should result in a corrupted WAL. 2. Run db_stress with recycle_log_file_num=1. Crash during recovery immediately after writing manifest and updating the current file. The old log from the previous run is left behind, but the memtable would have been flushed during recovery and the CF log number will point to the newer log 3. Run db_stress with recycle_log_file_num=0. During recovery, the old log file will be processed and the corruption will be detected. Since the CF has moved ahead, we get the "SST file is ahead of WAL" error Test - 1. stress_crash 2. make check Pull Request resolved: facebook#5412 Differential Revision: D15699120 Pulled By: anand1976 fbshipit-source-id: 9092ce81e7c4a0b4b4e66560c23ea4812a4d9cbe

riversand963 · 2019-06-08T00:31:40Z

Can you run some in-memory benchmark comparing the performance with that of master, and make sure there is no regression.
Update:
with db_bench will be good enough for now.

HaoyuHuang · 2019-06-08T00:39:20Z

Sure. I can do that. I will give an update here on Monday.

…lt type (facebook#5432) Summary: This affects some TSAN builds: env/env_test.cc: In member function ‘virtual void rocksdb::EnvPosixTestWithParam_MultiRead_Test::TestBody()’: env/env_test.cc:1126:76: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers] auto data = NewAligned(kSectorSize * 8, static_cast<const char>(i + 1)); ^ env/env_test.cc:1154:77: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers] auto buf = NewAligned(kSectorSize * 8, static_cast<const char>(i*2 + 1)); ^ Pull Request resolved: facebook#5432 Differential Revision: D15727277 Pulled By: ltamasi fbshipit-source-id: dc0e687b123e7c4d703ccc0c16b7167e07d1c9b0

HaoyuHuang · 2019-06-09T20:27:32Z

I loaded 1 million key-value pairs into the database and ran the readrandom benchmark. I gave the block cache 1 GB to make sure all reads hit the block cache after warmup. The throughput is comparable.
Throughput of this PR: 231334 ops/s.
Throughput of the master branch: 238428 ops/s.

Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000

Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000

Summary: The patch reduces the contention over prepared_mutex_ using these techniques: 1) Move ::RemovePrepared() to be called from the commit callback when we have two write queues. 2) Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop() 3) Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last. Pull Request resolved: facebook#5420 Differential Revision: D15741985 Pulled By: maysamyabandeh fbshipit-source-id: 84ce8016007e88bb6e10da5760ba1f0d26347735

riversand963

Mostly LGTM. Left a few minor comments.

riversand963 · 2019-06-10T17:33:27Z

table/block_based/block_based_table_reader.h

  Status ReadCompressionDictBlock(
      FilePrefetchBuffer* prefetch_buffer,
      std::unique_ptr<const BlockContents>* compression_dict_block) const;
  Status PrefetchIndexAndFilterBlocks(
+


No empty line please.

My bad. Removed.

riversand963 · 2019-06-10T18:31:24Z

table/block_based/block_based_table_reader.cc

@@ -1360,9 +1376,11 @@ Status BlockBasedTable::ReadCompressionDictBlock(
 }

 Status BlockBasedTable::PrefetchIndexAndFilterBlocks(
+


No empty line please.

riversand963 · 2019-06-10T19:11:32Z

table/block_based/block_based_table_reader.cc

@@ -3052,6 +3110,8 @@ Status BlockBasedTable::Prefetch(const Slice* const begin,
 }

 Status BlockBasedTable::VerifyChecksum() {
+  // TODO(haoyu): This function is called by external sst ingestion and user.


External sst ingestion is also called by user.

I updated the comments.

…cebook#5413) Summary: In regular RocksDB instance, `MemTable::earliest_seqno_` is "db sequence number at the time of creation". However, we cannot use the db sequence number to set the value of `MemTable::earliest_seqno_` for secondary instance, i.e. `DBImplSecondary` due to the logic of MANIFEST and WAL replay. When replaying the log files of the primary, the secondary instance first replays MANIFEST and updates the db sequence number if necessary. Next, the secondary replays WAL files, creates new memtables if necessary and inserts key-value pairs into memtables. The following can occur when the db has two or more column families. Assume the db has column family "default" and "cf1". At a certain in time, both "default" and "cf1" have data in memtables. 1. Primary triggers a flush and flushes "cf1". "default" is **not** flushed. 2. Secondary replays the MANIFEST updates its db sequence number to the latest value learned from the MANIFEST. 3. Secondary starts to replay WAL that contains the writes to "default". It is possible that the write batches' sequence numbers are smaller than the db sequence number. In this case, these write batches will be skipped, and these updates will not be visible to reader until "default" is later flushed. Pull Request resolved: facebook#5413 Differential Revision: D15637407 Pulled By: riversand963 fbshipit-source-id: 3de3fe35cfc6f1b9f844f3f926f0df29717b6580

facebook-github-bot · 2019-06-10T20:02:18Z

@HaoyuHuang has updated the pull request. Re-import the pull request

…ook#5314) Summary: Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement. TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Without the change - multireadrandom : 3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found) With the change - multireadrandom : 3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found) Pull Request resolved: facebook#5314 Differential Revision: D15742108 Pulled By: anand1976 fbshipit-source-id: 220fb0b8eea9a0d602ddeb371528f7af7936d771

riversand963

LGTM.

…racing.

facebook-github-bot · 2019-06-10T21:48:42Z

@HaoyuHuang has updated the pull request. Re-import the pull request

facebook-github-bot

@HaoyuHuang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-10T23:11:21Z

@HaoyuHuang merged this pull request in 5efa0d6.

…racing. (facebook#5421) Summary: BlockCacheLookupContext only contains the caller for now. We will trace block accesses at five places: 1. BlockBasedTable::GetFilter. 2. BlockBasedTable::GetUncompressedDict. 3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.) 4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.) 5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.) We create the context at: 1. BlockBasedTable::Get. (kUserGet) 2. BlockBasedTable::MultiGet. (kUserMGet) 3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.) 4. BlockBasedTable::Open. (kPrefetch) 5. Index/Filter::CacheDependencies. (kPrefetch) 6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize). I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable. Throughput of this PR: 231334 ops/s. Throughput of the master branch: 238428 ops/s. Experiment setup: RocksDB: version 6.2 Date: Mon Jun 10 10:42:51 2019 CPU: 24 * Intel Core Processor (Skylake) CPUCache: 16384 KB Keys: 20 bytes each Values: 100 bytes each (100 bytes after compression) Entries: 1000000 Prefix: 20 bytes Keys per prefix: 0 RawSize: 114.4 MB (estimated) FileSize: 114.4 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1 Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120 TODOs: 1. Create a caller for external SST file ingestion and differentiate the callers for iterator. 2. Integrate tracer to trace block cache accesses. Pull Request resolved: facebook#5421 Differential Revision: D15704258 Pulled By: HaoyuHuang fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a

Create a BlockCacheLookupContext to enable fine-grained block cache t…

7fb2a7c

…racing.

facebook-github-bot added the CLA Signed label Jun 6, 2019

HaoyuHuang requested review from ltamasi and riversand963 June 6, 2019 21:53

facebook-github-bot reviewed Jun 6, 2019

View reviewed changes

ltamasi reviewed Jun 6, 2019

View reviewed changes

riversand963 reviewed Jun 6, 2019

View reviewed changes

db/version_set.h Outdated Show resolved Hide resolved

db/version_set.h Outdated Show resolved Hide resolved

db/version_set.h Outdated Show resolved Hide resolved

ltamasi and others added 2 commits June 6, 2019 16:17

Remove the artifacts field from stress_crash/stress_crash_with_atomic…

fd94353

…_flush Summary: Pull Request resolved: facebook#5422 Differential Revision: D15706212 Pulled By: ltamasi fbshipit-source-id: 0acf060fb8568efee51c033e50b492bcf1095a4c

Address comments

13a5a41

Update a few comments

2624276

HaoyuHuang requested review from riversand963 and ltamasi and removed request for riversand963 June 7, 2019 00:20

facebook-github-bot reviewed Jun 7, 2019

View reviewed changes

Remove special characters from job names (facebook#5424)

ad52626

Summary: Special characters like slashes and parentheses are not supported. Pull Request resolved: facebook#5424 Differential Revision: D15708067 Pulled By: ltamasi fbshipit-source-id: 90527ec3ee882a0cdd1249c3946f5eff2ff7c115

riversand963 reviewed Jun 7, 2019

View reviewed changes

Address comments

204f833

HaoyuHuang requested a review from riversand963 June 7, 2019 17:58

ltamasi and others added 2 commits June 7, 2019 15:17

riversand963 reviewed Jun 10, 2019

View reviewed changes

riversand963 and others added 2 commits June 10, 2019 12:58

Address comments

d506d3b

HaoyuHuang requested a review from riversand963 June 10, 2019 20:27

riversand963 approved these changes Jun 10, 2019

View reviewed changes

HaoyuHuang added 7 commits June 10, 2019 14:23

Create a BlockCacheLookupContext to enable fine-grained block cache t…

e8ffbe3

…racing.

Address comments

6a44403

Update a few comments

400c46d

Address comments

927279c

Address comments

6129441

Merge conflicts

de24d30

Merge conflicts

19bc915

facebook-github-bot reviewed Jun 10, 2019

View reviewed changes

facebook-github-bot closed this in 5efa0d6 Jun 10, 2019

facebook-github-bot added the Merged label Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a BlockCacheLookupContext to enable fine-grained block cache tracing. #5421

Create a BlockCacheLookupContext to enable fine-grained block cache tracing. #5421

HaoyuHuang commented Jun 6, 2019 •

edited

Loading

facebook-github-bot left a comment

ltamasi left a comment

riversand963 left a comment

facebook-github-bot commented Jun 7, 2019

facebook-github-bot commented Jun 7, 2019

HaoyuHuang commented Jun 7, 2019

facebook-github-bot left a comment

riversand963 left a comment

facebook-github-bot commented Jun 7, 2019

riversand963 commented Jun 8, 2019 •

edited

Loading

HaoyuHuang commented Jun 8, 2019

HaoyuHuang commented Jun 9, 2019

riversand963 left a comment

riversand963 Jun 10, 2019

HaoyuHuang Jun 10, 2019

riversand963 Jun 10, 2019

HaoyuHuang Jun 10, 2019

riversand963 Jun 10, 2019

HaoyuHuang Jun 10, 2019

facebook-github-bot commented Jun 10, 2019

riversand963 left a comment

facebook-github-bot commented Jun 10, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jun 10, 2019

		@@ -1360,9 +1376,11 @@ Status BlockBasedTable::ReadCompressionDictBlock(
		}

		Status BlockBasedTable::PrefetchIndexAndFilterBlocks(

Create a BlockCacheLookupContext to enable fine-grained block cache tracing. #5421

Create a BlockCacheLookupContext to enable fine-grained block cache tracing. #5421

Conversation

HaoyuHuang commented Jun 6, 2019 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

ltamasi left a comment

Choose a reason for hiding this comment

riversand963 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 7, 2019

facebook-github-bot commented Jun 7, 2019

HaoyuHuang commented Jun 7, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

riversand963 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 7, 2019

riversand963 commented Jun 8, 2019 • edited Loading

HaoyuHuang commented Jun 8, 2019

HaoyuHuang commented Jun 9, 2019

riversand963 left a comment

Choose a reason for hiding this comment

riversand963 Jun 10, 2019

Choose a reason for hiding this comment

HaoyuHuang Jun 10, 2019

Choose a reason for hiding this comment

riversand963 Jun 10, 2019

Choose a reason for hiding this comment

HaoyuHuang Jun 10, 2019

Choose a reason for hiding this comment

riversand963 Jun 10, 2019

Choose a reason for hiding this comment

HaoyuHuang Jun 10, 2019

Choose a reason for hiding this comment

facebook-github-bot commented Jun 10, 2019

riversand963 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 10, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 10, 2019

HaoyuHuang commented Jun 6, 2019 •

edited

Loading

riversand963 commented Jun 8, 2019 •

edited

Loading