Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioned filters #1891

Closed

Conversation

maysamyabandeh
Copy link
Contributor

Partition filter blocks and an index on the partitions. This should reduce the cost of reading filters when they do not fit into the memory.

The feature is currently depend on partitioned indexes and need that being enabled too. Partition filters can be enabled by using full filters and setting partition_filters in table options to true. Currently a filter partition is cut the same time that a index partition is cut, which is controlled by index_per_partition.

A new test is added table/partitioned_filter_block_test.cc and also db/db_bloom_filter_test.cc is updated. The code is also refactored to allow testing the partitioned filter module separately.

@yiwu-arbug
Copy link
Contributor

Sorry for delay. Will find time to review today or tomorrow.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance the PR can be split? It is huge and I don't know where to start reading.

@@ -496,6 +496,13 @@ Status DBTestBase::ReadOnlyReopen(const Options& options) {

Status DBTestBase::TryReopen(const Options& options) {
Close();
last_options_.table_factory.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about (in a separate PR) make the last_options_ a unique_ptr, and destruct the content before assignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can do a separate patch. But I already tried the destruction approach and it causes other issues. I did not have time to look into it but apparently it would also destruct some objects that will be reused later.

#include "util/crc32c.h"
#include "util/stop_watch.h"
#include "util/string_util.h"
#include "util/xxhash.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of these headers can probably only include in .cc file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will remove them.

@@ -625,6 +625,9 @@ static std::unordered_map<std::string, OptionTypeInfo>
{"index_per_partition",
{offsetof(struct BlockBasedTableOptions, index_per_partition),
OptionType::kUInt64T, OptionVerificationType::kNormal, false, 0}},
{"partition_filters",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does partitioned filters always comes with partitioned index? In that case can we have one less options, and just enable partition filter if index_type==kTwolevelIndexSearch and filter_policy != null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, in general they should not have two come together and the implementation should be improved by future patches (as we discussed offline) to separate them.

In the particular case that it is suggested here (index_type==kTwolevelIndexSearch and filter_policy != null) does not still mean that the filter has to be partitioned. The other way around is true though: if filter is partitioned index has to be partitioned too. But since this restriction is to be lifted in future, I think it is better to keep their configuration separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright. Let's comment in table.h that partition_filters works only with kTwolevelIndexSearch right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

// A Mock of BlockBasedTable
// The only three engaged methods are GetFilter, ReadFilter, and Open. The rest
// are noop.
BlockBasedTable::CachableEntry<FilterBlockReader> BlockBasedTable::GetFilter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's the performance different is with Mocking this way vs. with inheritance? I would prefer inheritance if the difference is negligible, since it is simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require rigorous benchmarking which adds a lot to the cost of preparing this patch. It is much simpler and faster to go with an option that we are confident that is safe and inexpensive.

I do not see much complexity added by the current approach; it simply excludes one object file from a test binary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much overhead if the mock class subclassing BlockBasedTable and make some of the methods virtual calls. I was trying very hard with the key comparator and only get 5% gain (#1928). Let's make the mocking more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a simple test and observed 2% reduction in throughput. I am hence inclined to avoid virtual.

TEST_TMPDIR=/dev/shm/virtualtest/ ./db_bench --benchmarks=fillrandom --num=1000000 -value_size=100 -compression_type=none -bloom_bits=10
TEST_TMPDIR=/dev/shm/virtualtest/ ./db_bench --benchmarks="readseq,readrandom[X3]" --use_existing_db --num=1000000 --reads=10000000 --cache_size=10000000000 -threads=32 -compression_type=none -duration 60 -bloom_bits=10 -disable_auto_compactions

virtual: readrandom [AVG 3 runs] : 7995501 ops/sec; 559.9 MB/sec
non : readrandom [AVG 3 runs] : 8174033 ops/sec; 572.4 MB/sec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way around is making PartitionedFilterBlockBuilder a template, parameterized on BlockBasedTable. This should get rid of virtual overhead, but template is something need to avoid as well.

cc @siying @IslamAbdelRahman @ajkr @lightmark adding more eyes on this. Here this patch is trying to mock BlockBasedTable to unit test PartitionedFilterBlockBuilder. Instead of inherit from BlockBasedTable, the test simply exclude table/block_based_table_reader.cc on compile (see the change to Makefile) and implement a fake BlockBasedTable that's only visible in this test. We are debating whether it worth to do this hack to avoid virtual calls to BlockBasedTable::GetFilter, which is a hot code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to yi offline. With -O3 I still fluctuation in results and cannot confirm that avoiding virtual gave a benefit. Going to turn GetFilter virtual.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

#include "table/persistent_cache_helper.h"
#include "table/block_based_table_reader.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but would forward declaration resolve the cyclic dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried many things until I reached this organization that works. It also seems cleaner to me compared to what we had before: persistent_cache_helper should not depend on block_based_table_reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't go into the detail of filter block building, but at high level, the multi-inheritance around PartitionIndexBuilder looks messy to me. Can we separate partition index builder and partition filter builder, and let them share a index builder member which builds the meta index?

virtual bool KeyMayMatch(const Slice& key,
uint64_t block_offset = kNotValid) = 0;
virtual bool KeyMayMatch(const Slice& key, uint64_t block_offset = kNotValid,
const bool no_io = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments about what's no_io and const_ikey_ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

* containing a secondary index on the partitions, built using
* ShortenedIndexBuilder.
*/
class PartitionIndexBuilder : public IndexBuilder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to keep the multi-inheritance for now, let's at least rename it to PartitionIndexAndFilterBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -618,7 +247,8 @@ struct BlockBasedTableBuilder::Rep {
TableProperties props;

bool closed = false; // Either Finish() or Abandon() has been called.
std::unique_ptr<FilterBlockBuilder> filter_block;
std::unique_ptr<FilterBlockBuilder> filter_block_gc;
FilterBlockBuilder* filter_block;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename it filter_builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not the name chosen by me. Would like to do this refactoring as part of this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let's update this variable name.

#include "util/xxhash.h"

#include "table/index_builder.h"
#include "table/partitioned_filter_block.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of the these headers are not used. clean it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -51,7 +51,12 @@ class FilterBlockBuilder {
virtual bool IsBlockBased() = 0; // If is blockbased filter
virtual void StartBlock(uint64_t block_offset) = 0; // Start new block filter
virtual void Add(const Slice& key) = 0; // Add a key to current filter
virtual Slice Finish() = 0; // Generate Filter
inline Slice Finish() { // Generate Filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove inline keyword, leave compiler to decide whether to inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (ok() && r->filter_block != nullptr) {
Status s;
Slice filter_content = r->filter_block->Finish(filter_block_handle, &s);
r->props.filter_size += filter_content.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it initialized to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Slice filter_content = r->filter_block->Finish(filter_block_handle, &s);
r->props.filter_size += filter_content.size();
WriteRawBlock(filter_content, kNoCompression, &filter_block_handle);
while (s.IsIncomplete()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer moving line 642-644 into the while loop. Also shall we fail on non-ok non-incomplete status?

Status s = Status::Incomplete();
while (s.IsIncomplete()) {
  ...
}
if (!s.ok()) {
  return s;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -288,6 +341,115 @@ class BlockBasedTable::BlockEntryIteratorState : public TwoLevelIteratorState {
BlockBasedTable* table_;
const ReadOptions read_options_;
bool skip_filters_;
bool is_user_data_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment on what does this mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we shall name it is_index_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense.

@@ -208,10 +178,15 @@ class PartitionIndexReader : public IndexReader {
// return a two-level iterator: first level is on the partition index
virtual InternalIterator* NewIterator(BlockIter* iter = nullptr,
bool dont_care = true) override {
// Filters are already checked before seeking the index
const bool skip_filters = true;
const bool is_user_data = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the more common coding style for passing booleans is like:

new BlockBasedTable::BlockEntryIteratorState(
    table_, ReadOptions(), true /*skip_filters*/, false /*is_user_data*/),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I noticed that but I believe mine is more natural. I can still change it back if you feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'm okay with either.

table_, ReadOptions(), skip_filters, !is_user_data),
index_block_->NewIterator(comparator_, nullptr, true));
// TODO: Update TwoLevelIterator to be able to make use of on-stack
// BlockIter while the state is on heap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The on-stack/on heap iterator thing still look weird to me... Does it really make any perf difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require rigorous benchmarking to be able to answer that. Even if we want to revert this approach that was introduced long time ago, I would rather do it separately from this patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can leave it for future since it is not introduce in this patch. It is said that "premature optimization is the root of all evil" and I want to avoid that.

@maysamyabandeh maysamyabandeh mentioned this pull request Mar 3, 2017
maysamyabandeh pushed a commit to maysamyabandeh/rocksdb that referenced this pull request Mar 3, 2017
This is the first split of facebook#1891 and will be needed for the upcoming partitioned filter patch.
facebook-github-bot pushed a commit that referenced this pull request Mar 4, 2017
Summary:
This is the first split of #1891 and will be needed for the upcoming partitioned filter patch.
Closes #1949

Differential Revision: D4652152

Pulled By: maysamyabandeh

fbshipit-source-id: 9801778
maysamyabandeh pushed a commit to maysamyabandeh/rocksdb that referenced this pull request Mar 7, 2017
This is the second split of this pull request:
facebook#1891 which includes only the
builder part. The testing will be included in the third split, where the
reader is also included.
facebook-github-bot pushed a commit that referenced this pull request Mar 7, 2017
Summary:
This is the second split of this pull request: #1891 which includes only the builder part. The testing will be included in the third split, where the reader is also included.
Closes #1952

Differential Revision: D4660272

Pulled By: maysamyabandeh

fbshipit-source-id: 36b3cf0
maysamyabandeh pushed a commit to maysamyabandeh/rocksdb that referenced this pull request Mar 20, 2017
This is the last split of this pull request:
facebook#1891 which includes
the reader part as well as the tests.
@maysamyabandeh
Copy link
Contributor Author

The PR is split into three PRs, which are linked from here.

facebook-github-bot pushed a commit that referenced this pull request Mar 22, 2017
Summary:
This is the last split of this pull request: #1891 which includes the reader part as well as the tests.
Closes #1961

Differential Revision: D4672216

Pulled By: maysamyabandeh

fbshipit-source-id: 6a2b829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants