-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve point-lookup performance using a data block hash index #4174
Improve point-lookup performance using a data block hash index #4174
Conversation
…eOptions Reviewers: Sagar Vemuri
Summary: The first step of the DataBlockHashIndex implementation. A string based hash table is implemented and unit-tested. DataBlockHashIndexBuilder: Add() takes pairs of <key, restart_index>, and formats it into a string when Finish() is called. DataBlockHashIndex: initialized by the formatted string, and can interpret it as a hash table. Supporting Seek(). Test Plan: Unit test: data_block_hash_index_test make check -j 32 Reviewers: Sagar Vemuri
Summary: The Seek() in the initial implementation is inefficient in that it needs vector creation and emplace operations, which can be eliminated by a iteration implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
table/block.cc
Outdated
assert(data_block_hash_index_); | ||
Slice user_key = ExtractUserKey(target); | ||
std::unique_ptr<DataBlockHashIndexIterator> data_block_hash_iter( | ||
data_block_hash_index_->NewIterator(user_key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get rid of this malloc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have an iterator instance embedded in the DataBlockIter
to avoid the extra malloc, and initialize it every time I use it. Would this work?
The trade-off is a new DataBlockIter
is malloc-ed every time BlockBasedTable::Get()
is called. If I can embed the DataBlockHashIndexIterator
into DataBlockIter
we save half of the malloc but lose a little space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't do that.
Instead, you can allocate it in the stack. Something like what we do in DataBlockIterator
DataBlockHashIndexIterator iter;
data_block_hash_index_->NewIterator(user_key, &iter);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is totally bypass BlockIter interface, and to have a new call path from BlockBasedTable::Get()
, to Block::Get()
(TODO). I think this is doable, and it also solves the extra memory access from the restart index to restart offset as described below.
table/block.cc
Outdated
|
||
for (; data_block_hash_iter->Valid(); data_block_hash_iter->Next()) { | ||
uint32_t restart_index = data_block_hash_iter->Value(); | ||
SeekToRestartPoint(restart_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of how to get rid of this extra memory access.
table/block.cc
Outdated
// 2) (a key larger than user_key) + seq + type | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how it works?
Assume we have following keys, 1, 2, 3, ...., 99.
Let's say one bucket contains two keys, 22 and 88, and you are seeking to "22" and get into this bucket. Are you go through this for-loop from 22, 23 until 88?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer: the for loop will return if it can find the first key_
that matches the seek_key
without further looping on the rest of the keys in the bucket.
More details: the entries stored in the bucket are a checksum of the key and the residing restart interval index (or restart offset).
Say key01
, key02
, ..., key99
are in the block. key22
and key88
are hashed to the same bucket, among other keys. Assume they belong to different restart interval R2
and R8
.
When seeking key22
, we hash the key to get the bucket. There can be many entries in the bucket. The iterator only emits entries that have a matching check sum, meaning a potential match. But we still have to linear check the restart interval to see if the key is really there or not, as some other key hashed to this bucket may happen to have an identical checksum.
In the example, say key22
and key88
have the same checksum. The outer for-loop
is supposed to loop on the two entries key22
and key88
. Then in the inter while-loop
, it find the entry for key22
, then jump to the restart interval R2
and performs linear search in the restart interval. I will then break out of the inner while-loop
when key_ >= user_key
. A following check make sure the user key matches. If it is a match, function returns without going for another for-loop on the bucket entry correspoding to key88
.
…sh_index_block_level
@fgwu has updated the pull request. Re-import the pull request |
1 similar comment
@fgwu has updated the pull request. Re-import the pull request |
637b33c
to
8462db0
Compare
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
8462db0
to
995c790
Compare
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…tor() Summary: Move the iterator memory space to stack. Address AppVeyor's commplain about type cast from size_t to uint16_t
995c790
to
8e14351
Compare
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired. Before we can review or merge your code, we need you to email cla@fb.com with your details so we can update your status. |
@fgwu has updated the pull request. Re-import the pull request |
@fgwu has updated the pull request. Re-import the pull request |
…about record type
@fgwu has updated the pull request. Re-import the pull request |
1 similar comment
@fgwu has updated the pull request. Re-import the pull request |
…sh_index_block_level
5722699
to
82d61b1
Compare
@fgwu has updated the pull request. Re-import the pull request |
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fgwu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great. Lets get this in.
Suggested changes before landing:
- Update the function name in comparator.h
- Update summary with latest numbers.
- Update Test plan section.
include/rocksdb/comparator.h
Outdated
// as equal by this comparator. | ||
// The major use case is to determine if DataBlockHashIndex is compatible | ||
// with the customized comparator. | ||
virtual bool CanKeysWithDifferentByteContentsEqual() const { return true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: (as this is in the public API):
Add a 'Be' in between.
s/CanKeysWithDifferentByteContentsEqual/CanKeysWithDifferentByteContentsBeEqual/
…thDifferentByteContentsBeEqual/
@fgwu has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fgwu is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…ook#4174) Summary: Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.` The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too. Pull Request resolved: facebook#4174 Differential Revision: D8965914 Pulled By: fgwu fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198
Summary:
Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless
BlockBasedTableOptions::data_block_index_type
is set todata_block_index_type = kDataBlockBinaryAndHash.
The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using
BlockBasedTableOptions::data_block_hash_table_util_ratio
. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too.Test Plan:
added unit test
make -j32 check
and make sure all test passSome performance numbers. These experiments run against SSDs.
CPU Util
is the CPU util percentage of theDataBlockIter
point-lookup amongdb_bench
. The CPU util percentage is captured byperf
.Also we compare with the
master
branch on which thefeature
PR based to make sure there is no performance regression on the default binary seek case. These experiments run against tmpfs withoutperf
.