-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse data block iterator in BlockBasedTableReader::MultiGet() #5314
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Can you mention the db_bench options that you used to run this multigetrandom benchmark in the summary, so that they could be part of the commit message.
table/block_based_table_reader.cc
Outdated
@@ -2872,6 +2872,8 @@ void BlockBasedTable::MultiGet(const ReadOptions& read_options, | |||
iiter_unique_ptr.reset(iiter); | |||
} | |||
|
|||
std::unique_ptr<DataBlockIter> biter(new DataBlockIter());; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this allocation? We used to put it in stack and now we regress to heap allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying Curious what's the problem with DataBlockIter being on heap here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagar0 we need to call malloc. It's something we general want to avoid in critical paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like if we have a Reset()
method for DataBlockIter
, then we do not have to perform dynamic memory allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a good point. Let me see how to avoid it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that in BlockBasedTableIterator
, we don't even call Reset(), and just call NewDataBlockIterator() to move on to a new data block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if we can avoid heap allocation.
table/block_based_table_reader.cc
Outdated
@@ -2872,6 +2872,8 @@ void BlockBasedTable::MultiGet(const ReadOptions& read_options, | |||
iiter_unique_ptr.reset(iiter); | |||
} | |||
|
|||
std::unique_ptr<DataBlockIter> biter(new DataBlockIter());; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like if we have a Reset()
method for DataBlockIter
, then we do not have to perform dynamic memory allocation.
table/block_based_table_reader.cc
Outdated
NewDataBlockIterator<DataBlockIter>( | ||
rep_, read_options, iiter->value(), &biter, false, | ||
true /* key_includes_seq */, get_context); | ||
if (iiter->value().offset() != bhandle.offset()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the initial value of bhandle.offset()
before it is initialized? If it is 0, how if iiter->value().offset() is also 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good point. It looks like the default constructor initializes offset_
to 0. To make it more robust, I'll check both offset and size and initialize bhandle
to kNullBlockHandle
.
table/block_based_table_reader.cc
Outdated
rep_, read_options, iiter->value(), &biter, false, | ||
true /* key_includes_seq */, get_context); | ||
if (iiter->value().offset() != bhandle.offset()) { | ||
bhandle = iiter->value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It seems that we only need to store offset
here and we don't need to copy the whole handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
table/block_based_table_reader.cc
Outdated
rep_, read_options, iiter->value(), &biter, false, | ||
true /* key_includes_seq */, get_context); | ||
if (iiter->value().offset() != bhandle.offset() || | ||
iiter->value().size() != bhandle.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe offset is enough. We can skip the size check. We already do it when reseeking in iterator since long ago. In this way, actually only offset of the previous block handle needs to be stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think 0 is a valid offset. I just got a failure in asan_crash because of it. But I believe we can just check for offset by initializing it to ULONG_MAX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 0 is a valid offset. I agree that max int is a good idea. Make sure you use the one defined in port.h.
Found a bug and fixed in the latest commit. When reusing a data block for a key, the ref count needs to be incremented for the corresponding block cache handle, and we need to register a cleanup function so the ref count can be decrement when the |
9781ba5
to
3614d10
Compare
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
3614d10
to
cc8177b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@anand1976 merged this pull request in 63ace8e. |
…ook#5314) Summary: Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement. TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Without the change - multireadrandom : 3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found) With the change - multireadrandom : 3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found) Pull Request resolved: facebook#5314 Differential Revision: D15742108 Pulled By: anand1976 fbshipit-source-id: 220fb0b8eea9a0d602ddeb371528f7af7936d771
Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement.
TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Without the change -
multireadrandom : 3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found)
With the change -
multireadrandom : 3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found)
Test:
make check
asan_crash
asan_check