Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve point-lookup performance using a data block hash index #4174

Closed
wants to merge 127 commits into from

Conversation

fgwu
Copy link
Contributor

@fgwu fgwu commented Jul 24, 2018

Summary:

Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless BlockBasedTableOptions::data_block_index_type is set to data_block_index_type = kDataBlockBinaryAndHash.

The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using BlockBasedTableOptions::data_block_hash_table_util_ratio. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too.

Test Plan:

added unit test
make -j32 check and make sure all test pass

Some performance numbers. These experiments run against SSDs.
CPU Util is the CPU util percentage of the DataBlockIter point-lookup among db_bench. The CPU util percentage is captured by perf.

# large cache 20GB
       index | Throughput |             | fallback | cache miss | DB Space
(util_ratio) |     (MB/s) | CPU Util(%) |    ratio |      ratio |     (GB)
------------ | -----------| ----------- | -------- | ---------- | --------
      binary |        116 |       27.17 |    1.000 |   0.000494 |     5.41
       hash1 |        123 |       22.21 |    0.524 |   0.000502 |     5.59
     hash0.9 |        126 |       22.89 |    0.559 |   0.000502 |     5.61
     hash0.8 |        129 |       21.65 |    0.487 |   0.000504 |     5.63
     hash0.7 |        127 |       21.12 |    0.463 |   0.000504 |     5.65
     hash0.6 |        130 |       20.62 |    0.423 |   0.000506 |     5.69
     hash0.5 |        132 |       19.34 |    0.311 |   0.000510 |     5.75


# small cache 1GB
       index | Throughput |             | fallback | cache miss | DB Space
(util_ratio) |     (MB/s) | CPU Util(%) |    ratio |      ratio |     (GB)
------------ | -----------| ----------- | -------- | ---------- | --------
      binary |       26.8 |        2.02 |    1.000 |   0.923345 |     5.41
       hash1 |       25.9 |        1.49 |    0.524 |   0.924571 |     5.59
     hash0.9 |       27.5 |        1.59 |    0.559 |   0.924561 |     5.61
     hash0.8 |       27.4 |        1.52 |    0.487 |   0.924868 |     5.63
     hash0.7 |       27.7 |        1.44 |    0.463 |   0.924858 |     5.65
     hash0.6 |       26.8 |        1.36 |    0.423 |   0.925160 |     5.69
     hash0.5 |       28.0 |        1.22 |    0.311 |   0.925779 |     5.75

Also we compare with the master branch on which the feature PR based to make sure there is no performance regression on the default binary seek case. These experiments run against tmpfs without perf.

master: b271f956c Fix a TSAN failure (#4250)
feature: bf411a50b DataBlockHashIndex: inline SeekForGet() to speedup the fallback path

# large cache 20GB
    branch | Throughput | cache miss | DB Space ||       branch | Throughput | cache miss | DB Space
      #run |     (MB/s) |      ratio |     (GB) ||         #run |     (MB/s) |      ratio |     (GB)
---------- | -----------| ---------- | -------- || ------------ | -----------| ---------- | --------
master/1   |      127.5 |   0.000494 |     5.41 ||  feature/1   |      129.9 |   0.000494 |     5.41
master/2   |      130.7 |   0.000494 |     5.41 ||  feature/2   |      126.3 |   0.000494 |     5.41
master/3   |      128.7 |   0.000494 |     5.41 ||  feature/3   |      128.7 |   0.000494 |     5.41
master/4   |      105.4 |   0.000494 |     5.41 ||  feature/4   |      131.1 |   0.000494 |     5.41
master/5   |      135.8 |   0.000494 |     5.41 ||  feature/5   |      132.7 |   0.000494 |     5.41
master/avg |      125.6 |   0.000494 |     5.41 ||  feature/avg |      129.7 |   0.000494 |     5.41


# small cache 1GB
    branch | Throughput | cache miss | DB Space ||       branch | Throughput | cache miss | DB Space
      #run |     (MB/s) |      ratio |     (GB) ||         #run |     (MB/s) |      ratio |     (GB)
---------- | -----------| ---------- | -------- || ------------ | -----------| ---------- | --------
master/1   |       36.9 |   0.923190 |     5.41 ||  feature/1   |       37.1 |   0.923189 |     5.41
master/2   |       36.8 |   0.923184 |     5.41 ||  feature/2   |       35.8 |   0.923196 |     5.41
master/3   |       35.8 |   0.923190 |     5.41 ||  feature/3   |       36.4 |   0.923183 |     5.41
master/4   |       27.8 |   0.923200 |     5.41 ||  feature/4   |       36.6 |   0.923191 |     5.41
master/5   |       37.7 |   0.923162 |     5.41 ||  feature/5   |       36.7 |   0.923141 |     5.41
master/avg |       35.0 |   0.923185 |     5.41 ||  feature/avg |       36.5 |   0.923180 |     5.41
							

# benchmarking command
# setting: num=200 million, reads=100 million, key_size=8B, value_size=40B, threads=16
$DB_BENCH  --data_block_index_type=${block_index} \
           --db=${db} \
           --block_size=16000 --level_compaction_dynamic_level_bytes=1 \
           --num=$num \
           --key_size=$ks \
           --value_size=$vs \
           --benchmarks=fillseq --compression_type=snappy \
           --statistics=false --block_restart_interval=1 \
           --compression_ratio=0.4 \
           --data_block_hash_table_util_ratio=${util_ratio} \
           --statistics=true \
           >${write_log}

$DB_BENCH  --data_block_index_type=${block_index} \
           --db=${db} \
           --block_size=16000 --level_compaction_dynamic_level_bytes=1 \
           --use_existing_db=true \
           --num=${num} \
           --reads=${reads} \
           --key_size=$ks \
           --value_size=$vs \
           --benchmarks=readtocache,readrandom \
           --compression_type=snappy \
           --block_restart_interval=16 \
           --compression_ratio=0.4 \
           --cache_size=${cache_size} \
           --data_block_hash_table_util_ratio=${util_ratio} \
           --use_direct_reads \
           --disable_auto_compactions \
           --threads=${threads} \
           --statistics=true \
           > ${read_log}

fgwu added 12 commits July 23, 2018 09:14
Summary:
The first step of the DataBlockHashIndex implementation. A string
based hash table is implemented and unit-tested.

DataBlockHashIndexBuilder: Add() takes pairs of
<key, restart_index>, and formats it into a string when Finish()
is called.
DataBlockHashIndex: initialized by the formatted string, and can
interpret it as a hash table. Supporting Seek().

Test Plan:
Unit test: data_block_hash_index_test
make check -j 32

Reviewers:
Sagar Vemuri
Summary:
The Seek() in the initial implementation is inefficient in that it
needs vector creation and emplace operations, which can be eliminated
by a iteration implementation.
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

table/block.cc Outdated
assert(data_block_hash_index_);
Slice user_key = ExtractUserKey(target);
std::unique_ptr<DataBlockHashIndexIterator> data_block_hash_iter(
data_block_hash_index_->NewIterator(user_key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get rid of this malloc here.

Copy link
Contributor Author

@fgwu fgwu Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have an iterator instance embedded in the DataBlockIter to avoid the extra malloc, and initialize it every time I use it. Would this work?

The trade-off is a new DataBlockIter is malloc-ed every time BlockBasedTable::Get() is called. If I can embed the DataBlockHashIndexIterator into DataBlockIter we save half of the malloc but lose a little space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do that.
Instead, you can allocate it in the stack. Something like what we do in DataBlockIterator

DataBlockHashIndexIterator iter;
data_block_hash_index_->NewIterator(user_key, &iter);

Copy link
Contributor Author

@fgwu fgwu Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is totally bypass BlockIter interface, and to have a new call path from BlockBasedTable::Get(), to Block::Get() (TODO). I think this is doable, and it also solves the extra memory access from the restart index to restart offset as described below.

table/block.cc Outdated

for (; data_block_hash_iter->Valid(); data_block_hash_iter->Next()) {
uint32_t restart_index = data_block_hash_iter->Value();
SeekToRestartPoint(restart_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of how to get rid of this extra memory access.

table/block.cc Outdated
// 2) (a key larger than user_key) + seq + type
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how it works?
Assume we have following keys, 1, 2, 3, ...., 99.
Let's say one bucket contains two keys, 22 and 88, and you are seeking to "22" and get into this bucket. Are you go through this for-loop from 22, 23 until 88?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: the for loop will return if it can find the first key_ that matches the seek_key without further looping on the rest of the keys in the bucket.

More details: the entries stored in the bucket are a checksum of the key and the residing restart interval index (or restart offset).

Say key01, key02, ..., key99 are in the block. key22 and key88 are hashed to the same bucket, among other keys. Assume they belong to different restart interval R2 and R8.

When seeking key22, we hash the key to get the bucket. There can be many entries in the bucket. The iterator only emits entries that have a matching check sum, meaning a potential match. But we still have to linear check the restart interval to see if the key is really there or not, as some other key hashed to this bucket may happen to have an identical checksum.

In the example, say key22 and key88 have the same checksum. The outer for-loop is supposed to loop on the two entries key22 and key88. Then in the inter while-loop, it find the entry for key22, then jump to the restart interval R2 and performs linear search in the restart interval. I will then break out of the inner while-loop when key_ >= user_key. A following check make sure the user key matches. If it is a match, function returns without going for another for-loop on the bucket entry correspoding to key88.

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

1 similar comment
@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

@fgwu fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 637b33c to 8462db0 Compare July 25, 2018 18:57
@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@fgwu fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 8462db0 to 995c790 Compare July 25, 2018 19:08
@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…tor()

Summary:
Move the iterator memory space to stack.
Address AppVeyor's commplain about type cast from size_t to uint16_t
@fgwu fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 995c790 to 8e14351 Compare July 25, 2018 21:05
@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email cla@fb.com with your details so we can update your status.

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

1 similar comment
@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

@fgwu fgwu force-pushed the fwu_data_block_hash_index_block_level branch from 5722699 to 82d61b1 Compare August 14, 2018 20:56
@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@sagar0 sagar0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great. Lets get this in.
Suggested changes before landing:

  1. Update the function name in comparator.h
  2. Update summary with latest numbers.
  3. Update Test plan section.

// as equal by this comparator.
// The major use case is to determine if DataBlockHashIndex is compatible
// with the customized comparator.
virtual bool CanKeysWithDifferentByteContentsEqual() const { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: (as this is in the public API):
Add a 'Be' in between.
s/CanKeysWithDifferentByteContentsEqual/CanKeysWithDifferentByteContentsBeEqual/

@facebook-github-bot
Copy link
Contributor

@fgwu has updated the pull request. Re-import the pull request

@fgwu fgwu changed the title DataBlockHashIndex: Adding HashIndex feature to Block and BlockBuilder Improve point-lookup performance using a data block hash index Aug 15, 2018
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fgwu is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

rcane pushed a commit to rcane/rocksdb that referenced this pull request Sep 13, 2018
…ook#4174)

Summary:
Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.`

The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too.
Pull Request resolved: facebook#4174

Differential Revision: D8965914

Pulled By: fgwu

fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants