Improve point-lookup performance using a data block hash index #4174

fgwu · 2018-07-24T07:21:56Z

Summary:

Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless BlockBasedTableOptions::data_block_index_type is set to data_block_index_type = kDataBlockBinaryAndHash.

The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using BlockBasedTableOptions::data_block_hash_table_util_ratio. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too.

Test Plan:

added unit test
make -j32 check and make sure all test pass

Some performance numbers. These experiments run against SSDs.
CPU Util is the CPU util percentage of the DataBlockIter point-lookup among db_bench. The CPU util percentage is captured by perf.

# large cache 20GB
       index | Throughput |             | fallback | cache miss | DB Space
(util_ratio) |     (MB/s) | CPU Util(%) |    ratio |      ratio |     (GB)
------------ | -----------| ----------- | -------- | ---------- | --------
      binary |        116 |       27.17 |    1.000 |   0.000494 |     5.41
       hash1 |        123 |       22.21 |    0.524 |   0.000502 |     5.59
     hash0.9 |        126 |       22.89 |    0.559 |   0.000502 |     5.61
     hash0.8 |        129 |       21.65 |    0.487 |   0.000504 |     5.63
     hash0.7 |        127 |       21.12 |    0.463 |   0.000504 |     5.65
     hash0.6 |        130 |       20.62 |    0.423 |   0.000506 |     5.69
     hash0.5 |        132 |       19.34 |    0.311 |   0.000510 |     5.75


# small cache 1GB
       index | Throughput |             | fallback | cache miss | DB Space
(util_ratio) |     (MB/s) | CPU Util(%) |    ratio |      ratio |     (GB)
------------ | -----------| ----------- | -------- | ---------- | --------
      binary |       26.8 |        2.02 |    1.000 |   0.923345 |     5.41
       hash1 |       25.9 |        1.49 |    0.524 |   0.924571 |     5.59
     hash0.9 |       27.5 |        1.59 |    0.559 |   0.924561 |     5.61
     hash0.8 |       27.4 |        1.52 |    0.487 |   0.924868 |     5.63
     hash0.7 |       27.7 |        1.44 |    0.463 |   0.924858 |     5.65
     hash0.6 |       26.8 |        1.36 |    0.423 |   0.925160 |     5.69
     hash0.5 |       28.0 |        1.22 |    0.311 |   0.925779 |     5.75

Also we compare with the master branch on which the feature PR based to make sure there is no performance regression on the default binary seek case. These experiments run against tmpfs without perf.

master: b271f956c Fix a TSAN failure (#4250)
feature: bf411a50b DataBlockHashIndex: inline SeekForGet() to speedup the fallback path

# large cache 20GB
    branch | Throughput | cache miss | DB Space ||       branch | Throughput | cache miss | DB Space
      #run |     (MB/s) |      ratio |     (GB) ||         #run |     (MB/s) |      ratio |     (GB)
---------- | -----------| ---------- | -------- || ------------ | -----------| ---------- | --------
master/1   |      127.5 |   0.000494 |     5.41 ||  feature/1   |      129.9 |   0.000494 |     5.41
master/2   |      130.7 |   0.000494 |     5.41 ||  feature/2   |      126.3 |   0.000494 |     5.41
master/3   |      128.7 |   0.000494 |     5.41 ||  feature/3   |      128.7 |   0.000494 |     5.41
master/4   |      105.4 |   0.000494 |     5.41 ||  feature/4   |      131.1 |   0.000494 |     5.41
master/5   |      135.8 |   0.000494 |     5.41 ||  feature/5   |      132.7 |   0.000494 |     5.41
master/avg |      125.6 |   0.000494 |     5.41 ||  feature/avg |      129.7 |   0.000494 |     5.41


# small cache 1GB
    branch | Throughput | cache miss | DB Space ||       branch | Throughput | cache miss | DB Space
      #run |     (MB/s) |      ratio |     (GB) ||         #run |     (MB/s) |      ratio |     (GB)
---------- | -----------| ---------- | -------- || ------------ | -----------| ---------- | --------
master/1   |       36.9 |   0.923190 |     5.41 ||  feature/1   |       37.1 |   0.923189 |     5.41
master/2   |       36.8 |   0.923184 |     5.41 ||  feature/2   |       35.8 |   0.923196 |     5.41
master/3   |       35.8 |   0.923190 |     5.41 ||  feature/3   |       36.4 |   0.923183 |     5.41
master/4   |       27.8 |   0.923200 |     5.41 ||  feature/4   |       36.6 |   0.923191 |     5.41
master/5   |       37.7 |   0.923162 |     5.41 ||  feature/5   |       36.7 |   0.923141 |     5.41
master/avg |       35.0 |   0.923185 |     5.41 ||  feature/avg |       36.5 |   0.923180 |     5.41

# benchmarking command
# setting: num=200 million, reads=100 million, key_size=8B, value_size=40B, threads=16
$DB_BENCH  --data_block_index_type=${block_index} \
           --db=${db} \
           --block_size=16000 --level_compaction_dynamic_level_bytes=1 \
           --num=$num \
           --key_size=$ks \
           --value_size=$vs \
           --benchmarks=fillseq --compression_type=snappy \
           --statistics=false --block_restart_interval=1 \
           --compression_ratio=0.4 \
           --data_block_hash_table_util_ratio=${util_ratio} \
           --statistics=true \
           >${write_log}

$DB_BENCH  --data_block_index_type=${block_index} \
           --db=${db} \
           --block_size=16000 --level_compaction_dynamic_level_bytes=1 \
           --use_existing_db=true \
           --num=${num} \
           --reads=${reads} \
           --key_size=$ks \
           --value_size=$vs \
           --benchmarks=readtocache,readrandom \
           --compression_type=snappy \
           --block_restart_interval=16 \
           --compression_ratio=0.4 \
           --cache_size=${cache_size} \
           --data_block_hash_table_util_ratio=${util_ratio} \
           --use_direct_reads \
           --disable_auto_compactions \
           --threads=${threads} \
           --statistics=true \
           > ${read_log}

…eOptions Reviewers: Sagar Vemuri

Summary: The first step of the DataBlockHashIndex implementation. A string based hash table is implemented and unit-tested. DataBlockHashIndexBuilder: Add() takes pairs of <key, restart_index>, and formats it into a string when Finish() is called. DataBlockHashIndex: initialized by the formatted string, and can interpret it as a hash table. Supporting Seek(). Test Plan: Unit test: data_block_hash_index_test make check -j 32 Reviewers: Sagar Vemuri

Summary: The Seek() in the initial implementation is inefficient in that it needs vector creation and emplace operations, which can be eliminated by a iteration implementation.

…t_ from Iterator

… fail

facebook-github-bot

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2018-07-24T17:37:35Z

table/block.cc

+  assert(data_block_hash_index_);
+  Slice user_key = ExtractUserKey(target);
+  std::unique_ptr<DataBlockHashIndexIterator> data_block_hash_iter(
+      data_block_hash_index_->NewIterator(user_key));


We need to get rid of this malloc here.

I may have an iterator instance embedded in the DataBlockIter to avoid the extra malloc, and initialize it every time I use it. Would this work?

The trade-off is a new DataBlockIter is malloc-ed every time BlockBasedTable::Get() is called. If I can embed the DataBlockHashIndexIterator into DataBlockIter we save half of the malloc but lose a little space.

Please don't do that.
Instead, you can allocate it in the stack. Something like what we do in DataBlockIterator

DataBlockHashIndexIterator iter; data_block_hash_index_->NewIterator(user_key, &iter);

Another way is totally bypass BlockIter interface, and to have a new call path from BlockBasedTable::Get(), to Block::Get() (TODO). I think this is doable, and it also solves the extra memory access from the restart index to restart offset as described below.

siying · 2018-07-24T17:38:25Z

table/block.cc

+
+  for (; data_block_hash_iter->Valid(); data_block_hash_iter->Next()) {
+    uint32_t restart_index = data_block_hash_iter->Value();
+    SeekToRestartPoint(restart_index);


Think of how to get rid of this extra memory access.

siying · 2018-07-24T17:46:20Z

table/block.cc

+        // 2) (a key larger than user_key) + seq + type
+        break;
+      }
+    }


Can you explain how it works?
Assume we have following keys, 1, 2, 3, ...., 99.
Let's say one bucket contains two keys, 22 and 88, and you are seeking to "22" and get into this bucket. Are you go through this for-loop from 22, 23 until 88?

Short answer: the for loop will return if it can find the first key_ that matches the seek_key without further looping on the rest of the keys in the bucket.

More details: the entries stored in the bucket are a checksum of the key and the residing restart interval index (or restart offset).

Say key01, key02, ..., key99 are in the block. key22 and key88 are hashed to the same bucket, among other keys. Assume they belong to different restart interval R2 and R8.

When seeking key22, we hash the key to get the bucket. There can be many entries in the bucket. The iterator only emits entries that have a matching check sum, meaning a potential match. But we still have to linear check the restart interval to see if the key is really there or not, as some other key hashed to this bucket may happen to have an identical checksum.

In the example, say key22 and key88 have the same checksum. The outer for-loop is supposed to loop on the two entries key22 and key88. Then in the inter while-loop, it find the entry for key22, then jump to the restart interval R2 and performs linear search in the restart interval. I will then break out of the inner while-loop when key_ >= user_key. A following check make sure the user key matches. If it is a match, function returns without going for another for-loop on the bucket entry correspoding to key88.

…sh_index_block_level

facebook-github-bot · 2018-07-25T17:35:45Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot · 2018-07-25T18:55:56Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot · 2018-07-25T18:57:19Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2018-07-25T19:08:58Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…tor() Summary: Move the iterator memory space to stack. Address AppVeyor's commplain about type cast from size_t to uint16_t

facebook-github-bot · 2018-07-25T21:05:29Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2018-07-25T21:10:14Z

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email cla@fb.com with your details so we can update your status.

facebook-github-bot · 2018-07-26T03:21:44Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot · 2018-08-14T19:01:18Z

@fgwu has updated the pull request. Re-import the pull request

…about record type

facebook-github-bot · 2018-08-14T19:10:23Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot · 2018-08-14T20:38:05Z

@fgwu has updated the pull request. Re-import the pull request

…sh_index_block_level

facebook-github-bot · 2018-08-14T20:56:38Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot · 2018-08-14T21:00:59Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot

fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… Lint

facebook-github-bot · 2018-08-15T02:35:28Z

@fgwu has updated the pull request. Re-import the pull request

…ata_block_hash_index_test

facebook-github-bot

fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2018-08-15T04:03:43Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot

fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

sagar0

Thanks, this looks great. Lets get this in.
Suggested changes before landing:

Update the function name in comparator.h
Update summary with latest numbers.
Update Test plan section.

sagar0 · 2018-08-15T17:20:03Z

include/rocksdb/comparator.h

+  // as equal by this comparator.
+  // The major use case is to determine if DataBlockHashIndex is compatible
+  // with the customized comparator.
+  virtual bool CanKeysWithDifferentByteContentsEqual() const { return true; }


Nit: (as this is in the public API):
Add a 'Be' in between.
s/CanKeysWithDifferentByteContentsEqual/CanKeysWithDifferentByteContentsBeEqual/

…thDifferentByteContentsBeEqual/

facebook-github-bot · 2018-08-15T18:42:25Z

@fgwu has updated the pull request. Re-import the pull request

facebook-github-bot

fgwu is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ook#4174) Summary: Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.` The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too. Pull Request resolved: facebook#4174 Differential Revision: D8965914 Pulled By: fgwu fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198

fgwu added 12 commits July 23, 2018 09:14

DataBlockIndexType: Added DataBlockIndexType option in BlockBasedTabl…

366fe15

…eOptions Reviewers: Sagar Vemuri

DataBlockHashIndex: Replace the inefficent Seek() with NewInterator

3e44b9c

Summary: The Seek() in the initial implementation is inefficient in that it needs vector creation and emplace operations, which can be eliminated by a iteration implementation.

DataBlockHashIndex: reduce index table size by change uint32 to uint16

6a233ba

DataBlockHashIndex: Bug fix in GetFixed16(); Remove unused field star…

6a6809f

…t_ from Iterator

DataBlockHashIndex: second hash tag uint16 conversion fixed

ae56fdd

DataBlockHashIndex: Rename test name to DataBlockHashIndex

b5fb3a2

DataBlockHashIndex: Fixed CMakeLists.txt bug that causing appveyor to…

a4e4860

… fail

DataBlockHashIndex: Update algorithm description in comments.

312aed1

DataBlockHashIndex: fixed the size check assertion for the data_block

9e35895

DataBlockHashIndex: Update the algorithm description.

403b6c5

DataBlockHashIndex: Block Level Suppot (block and block_builder)

3a490f7

facebook-github-bot added the CLA Signed label Jul 24, 2018

facebook-github-bot reviewed Jul 24, 2018

View reviewed changes

siying reviewed Jul 24, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into fwu_data_block_ha…

c994611

…sh_index_block_level

fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 637b33c to 8462db0 Compare July 25, 2018 18:57

facebook-github-bot reviewed Jul 25, 2018

View reviewed changes

fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 8462db0 to 995c790 Compare July 25, 2018 19:08

facebook-github-bot reviewed Jul 25, 2018

View reviewed changes

DataBlockHashIndex: Remove the malloc in DataBlockHashIndex::NewItera…

8e14351

…tor() Summary: Move the iterator memory space to stack. Address AppVeyor's commplain about type cast from size_t to uint16_t

fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 995c790 to 8e14351 Compare July 25, 2018 21:05

facebook-github-bot reviewed Jul 25, 2018

View reviewed changes

DataBlockHashIndex: Update SeekForGet() Contract, adding description …

22dadc4

…about record type

Merge remote-tracking branch 'upstream/master' into fwu_data_block_ha…

82d61b1

…sh_index_block_level

fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 5722699 to 82d61b1 Compare August 14, 2018 20:56

DataBlockHashIndex: update SeekForGet() contract, fixup

b1c8481

facebook-github-bot reviewed Aug 14, 2018

View reviewed changes

fgwu added 4 commits August 14, 2018 19:18

DataBlockHashIndex: Sort file in alphabetical order in TARGET to mute…

5104427

… Lint

DataBlockHashIndex: Address the line-to-long lint warning

31c6fd6

DataBlockHashIndex: Address Lint type cast warning

a711855

DataBlockHashIndex: fix lint warning, passing be reference

256422e

fgwu added 2 commits August 14, 2018 20:41

DataBlockHashIndex: Add block boundary test

1d49c9f

DataBlockHashIndex: move the block boundary test from table_test to d…

ba48a55

…ata_block_hash_index_test

facebook-github-bot reviewed Aug 15, 2018

View reviewed changes

sagar0 approved these changes Aug 15, 2018

View reviewed changes

DataBlockHashIndex: s/CanKeysWithDifferentByteContentsEqual/CanKeysWi…

593754e

…thDifferentByteContentsBeEqual/

fgwu changed the title ~~DataBlockHashIndex: Adding HashIndex feature to Block and BlockBuilder~~ Improve point-lookup performance using a data block hash index Aug 15, 2018

facebook-github-bot reviewed Aug 15, 2018

View reviewed changes

facebook-github-bot closed this in 19ec44f Aug 15, 2018

sagar0 mentioned this pull request Aug 20, 2018

[RFC] In-Block Hash Index #4087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve point-lookup performance using a data block hash index #4174

Improve point-lookup performance using a data block hash index #4174

fgwu commented Jul 24, 2018 •

edited

Loading

facebook-github-bot left a comment

siying Jul 24, 2018

fgwu Jul 24, 2018 •

edited

Loading

siying Jul 24, 2018

fgwu Jul 24, 2018 •

edited

Loading

siying Jul 24, 2018

siying Jul 24, 2018

fgwu Jul 24, 2018

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Jul 25, 2018

facebook-github-bot left a comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot left a comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot left a comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Jul 26, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot left a comment

facebook-github-bot commented Aug 15, 2018

facebook-github-bot left a comment

facebook-github-bot commented Aug 15, 2018

facebook-github-bot left a comment

sagar0 left a comment •

edited

Loading

sagar0 Aug 15, 2018

facebook-github-bot commented Aug 15, 2018

facebook-github-bot left a comment

Improve point-lookup performance using a data block hash index #4174

Improve point-lookup performance using a data block hash index #4174

Conversation

fgwu commented Jul 24, 2018 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

siying Jul 24, 2018

Choose a reason for hiding this comment

fgwu Jul 24, 2018 • edited Loading

Choose a reason for hiding this comment

siying Jul 24, 2018

Choose a reason for hiding this comment

fgwu Jul 24, 2018 • edited Loading

Choose a reason for hiding this comment

siying Jul 24, 2018

Choose a reason for hiding this comment

siying Jul 24, 2018

Choose a reason for hiding this comment

fgwu Jul 24, 2018

Choose a reason for hiding this comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Jul 25, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Jul 26, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot commented Aug 14, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 15, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 15, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

sagar0 left a comment • edited Loading

Choose a reason for hiding this comment

sagar0 Aug 15, 2018

Choose a reason for hiding this comment

facebook-github-bot commented Aug 15, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

fgwu commented Jul 24, 2018 •

edited

Loading

fgwu Jul 24, 2018 •

edited

Loading

fgwu Jul 24, 2018 •

edited

Loading

sagar0 left a comment •

edited

Loading