Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse data block iterator in BlockBasedTableReader::MultiGet() #5314

Closed
wants to merge 6 commits into from

Conversation

anand1976
Copy link
Contributor

@anand1976 anand1976 commented May 16, 2019

Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement.

TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4

Without the change -
multireadrandom : 3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found)

With the change -
multireadrandom : 3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found)

Test:
make check
asan_crash
asan_check

Copy link
Contributor

@sagar0 sagar0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Can you mention the db_bench options that you used to run this multigetrandom benchmark in the summary, so that they could be part of the commit message.

@@ -2872,6 +2872,8 @@ void BlockBasedTable::MultiGet(const ReadOptions& read_options,
iiter_unique_ptr.reset(iiter);
}

std::unique_ptr<DataBlockIter> biter(new DataBlockIter());;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this allocation? We used to put it in stack and now we regress to heap allocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying Curious what's the problem with DataBlockIter being on heap here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sagar0 we need to call malloc. It's something we general want to avoid in critical paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like if we have a Reset() method for DataBlockIter, then we do not have to perform dynamic memory allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a good point. Let me see how to avoid it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that in BlockBasedTableIterator, we don't even call Reset(), and just call NewDataBlockIterator() to move on to a new data block.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if we can avoid heap allocation.

@@ -2872,6 +2872,8 @@ void BlockBasedTable::MultiGet(const ReadOptions& read_options,
iiter_unique_ptr.reset(iiter);
}

std::unique_ptr<DataBlockIter> biter(new DataBlockIter());;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like if we have a Reset() method for DataBlockIter, then we do not have to perform dynamic memory allocation.

NewDataBlockIterator<DataBlockIter>(
rep_, read_options, iiter->value(), &biter, false,
true /* key_includes_seq */, get_context);
if (iiter->value().offset() != bhandle.offset()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the initial value of bhandle.offset() before it is initialized? If it is 0, how if iiter->value().offset() is also 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. It looks like the default constructor initializes offset_ to 0. To make it more robust, I'll check both offset and size and initialize bhandle to kNullBlockHandle.

rep_, read_options, iiter->value(), &biter, false,
true /* key_includes_seq */, get_context);
if (iiter->value().offset() != bhandle.offset()) {
bhandle = iiter->value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It seems that we only need to store offset here and we don't need to copy the whole handle.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

rep_, read_options, iiter->value(), &biter, false,
true /* key_includes_seq */, get_context);
if (iiter->value().offset() != bhandle.offset() ||
iiter->value().size() != bhandle.size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe offset is enough. We can skip the size check. We already do it when reseeking in iterator since long ago. In this way, actually only offset of the previous block handle needs to be stored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think 0 is a valid offset. I just got a failure in asan_crash because of it. But I believe we can just check for offset by initializing it to ULONG_MAX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 0 is a valid offset. I agree that max int is a good idea. Make sure you use the one defined in port.h.

@anand1976
Copy link
Contributor Author

Found a bug and fixed in the latest commit. When reusing a data block for a key, the ref count needs to be incremented for the corresponding block cache handle, and we need to register a cleanup function so the ref count can be decrement when the PinnableSlice is deleted.

anand76 added 6 commits June 9, 2019 20:25
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@anand1976 merged this pull request in 63ace8e.

vagogte pushed a commit to vagogte/rocksdb that referenced this pull request Jun 18, 2019
…ook#5314)

Summary:
Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement.

TEST_TMPDIR=/dev/shm/multiget numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4

Without the change -
multireadrandom :       3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found)

With the change -
multireadrandom :       3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found)
Pull Request resolved: facebook#5314

Differential Revision: D15742108

Pulled By: anand1976

fbshipit-source-id: 220fb0b8eea9a0d602ddeb371528f7af7936d771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants