Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-level Indexes #1814

Closed

Conversation

maysamyabandeh
Copy link
Contributor

@maysamyabandeh maysamyabandeh commented Jan 27, 2017

Partition Index blocks and use a Partition-index as a 2nd level index.

The two-level index can be used by setting
BlockBasedTableOptions::kTwoLevelIndexSearch as the index type and
configuring BlockBasedTableOptions::index_per_partition

t15539501

Partition Index blocks and use a Partition-index as a 2nd level index.

The two-level index can be used by setting
BlockBasedTableOptions::kTwoLevelIndexSearch as the index type and
configuring BlockBasedTableOptions::index_per_partition
@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay. I haven't finish reading the reader part, but some comments for the rest.

@@ -86,6 +86,9 @@ struct BlockBasedTableOptions {
// The hash index, if enabled, will do the hash lookup when
// `Options.prefix_extractor` is provided.
kHashSearch,

// A two-level index implementation. Both levels are binary search indexes.
kTwoLevelIndexSearch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the new value to block_base_table_index_type_string_map in util/options_helper.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is the feature consider complete (I think you plan to interleave secondary index and filter blocks)? If not, shall we not expose this enum, or have table builder return error for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will add that.

My thinking is to allow enable/disable each feature separately to measure their impact in the benchmarks. That extends to the interleaving phase. I have not thought it through about the switches of how to enable/disable them though and am open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can simply add comment here saying the feature is experimental and not ready for use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Will do that.

std::string key;
std::unique_ptr<IndexBuilder> value;
};
std::vector<Entry> entries; // list of partitioned indexes and their keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing underscore, e.g. s/entries/entries_/
same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks for catching this.

std::string handle_encoding;
last_partition_block_handle.EncodeTo(&handle_encoding);
index_block_builder_.Add(last_entry.key, handle_encoding);
entries.erase(entries.begin());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a deque or list be better than vector, if you want to remove from head?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Will update it.

@@ -138,6 +141,9 @@ struct BlockBasedTableOptions {
// Same as block_restart_interval but used for the index block.
int index_block_restart_interval = 1;

// number of index keys per partition of indexes in a multi-level index
uint64_t index_per_partition = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Shall we have a better default value?
  2. Have you consider partition index by size instead of by number of indexes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking. The implementation for using number of blocks per partition was straightforward and I figured it is good enough to enable benchmarking phase and when we got good result we can revise the implementation for better configuration or more optimized implementation. Would that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

if (!s.ok()) {
return s;
auto index_builder_status = r->index_builder->Finish(&index_blocks);
if (index_builder_status.IsIncomplete()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving partitioned index block logic here would make the code easier to read?

if (status.IsInComplete()) {
    while (status.IsInComplete()) {
        // write partitioned index
    }
} else {
    if (!status.ok()) {
        return status;
    }
    // write meta
    // write index
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @maysamyabandeh, please ignore this comment as it will mess up block order.

@@ -153,8 +153,9 @@ class BlockBasedTable::IndexReader {
virtual ~IndexReader() {}

// Create an iterator for index access.
// An iter is passed in, if it is not null, update this one and return it
// If it is null, create a new Iterator
// If a non-null iter is passed in it MIGHT be used instead of creating a new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the logic here. When it will use the BlockIter pass in and when it will not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it would always make use of the passed BlockIter object. Currently the existing classes that inherit from index reader still make use of it. However the new class that is added with this patch, index reader for partitioned indexes, does not. So we needed to update the contract and make the users to check the return value to verify whether the input object was used by NewIterator or it was ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks messy to have to check if result is on heap or stack. Can we update PartitionIndexReader to use the input iter?

@@ -1380,6 +1439,28 @@ class BlockBasedTable::BlockEntryIteratorState : public TwoLevelIteratorState {
bool skip_filters_;
};

BlockBasedTable::IndexPartitionIteratorState::IndexPartitionIteratorState(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems IndexPartitionIteratorState is same as BlockEntryIteratorState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Due to similarity of index and data blocks the impl of IndexPartitionIteratorState seems to have evolved to be identical to that of BlockEntryIteratorState. I will remove it then.

@@ -86,6 +86,9 @@ struct BlockBasedTableOptions {
// The hash index, if enabled, will do the hash lookup when
// `Options.prefix_extractor` is provided.
kHashSearch,

// A two-level index implementation. Both levels are binary search indexes.
kTwoLevelIndexSearch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can simply add comment here saying the feature is experimental and not ready for use.

@siying
Copy link
Contributor

siying commented Jan 31, 2017

Please make sure we clearly document the format in the code comments (and in wiki pages after it is committed).

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

* The format on the disk would be I I I I I I IP where I is block containing a
* partition of indexes built using ShortenedIndexBuilder and IP is a block
* containing a secondary index on the partitions, built using
* ShortenedIndexBuilder.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying I tried to document the format here. Is there any other place in the source that needs to get updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link. I updated the wiki.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me in general.

* The format on the disk would be I I I I I I IP where I is block containing a
* partition of indexes built using ShortenedIndexBuilder and IP is a block
* containing a secondary index on the partitions, built using
* ShortenedIndexBuilder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -149,6 +150,9 @@ class BlockBasedTable : public TableReader {
// The key retrieved are internal keys.
Status GetKVPairsFromDataBlocks(std::vector<KVPairBlock>* kv_pair_blocks);

//class IndexPartitionIteratorState;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused. Remove?

@@ -153,8 +153,9 @@ class BlockBasedTable::IndexReader {
virtual ~IndexReader() {}

// Create an iterator for index access.
// An iter is passed in, if it is not null, update this one and return it
// If it is null, create a new Iterator
// If a non-null iter is passed in it MIGHT be used instead of creating a new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks messy to have to check if result is on heap or stack. Can we update PartitionIndexReader to use the input iter?

@yiwu-arbug
Copy link
Contributor

btw. run make format before commit?

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@IslamAbdelRahman has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Feb 8, 2017

Recently we started to see more Travis time-out. We added one extra test case here, which may make the tests run slightly longer, which may be a reason. Can you check whether the newly added test case runs much slower than others?

@maysamyabandeh
Copy link
Contributor Author

maysamyabandeh commented Feb 8, 2017

Here are the differences between the slow and fast travis runs:

$ cat log-fast.txt | sort -nt'(' -k2 -r | grep  OK | head
[       OK ] ExternalSSTFileTest.CompactDuringAddFileRandom (74505 ms)
[       OK ] ExternalSSTFileTest.OverlappingRanges (60118 ms)
[       OK ] DBWALTest.RecoverFromCorruptedWALWithoutFlush (52164 ms)
[       OK ] ExternalSSTFileTest.IngestFileWithGlobalSeqnoRandomized (38311 ms)
[       OK ] DBIteratorTest.PinnedDataIteratorRandomized (29303 ms)
[       OK ] DBTestCompactionFilter.CompactionFilterWithValueChange (20010 ms)
[       OK ] FaultTest/FaultInjectionTest.FaultTest/0 (19314 ms)
[       OK ] FaultTest/FaultInjectionTest.FaultTest/1 (17737 ms)
[       OK ] DBWALTest.kPointInTimeRecovery (14958 ms)
[       OK ] ManualCompactionTest.Test (13979 ms)
$ cat log-slow.txt | sort -nt'(' -k2 -r | grep OK | head
[       OK ] ExternalSSTFileTest.CompactDuringAddFileRandom (237878 ms)
[       OK ] ExternalSSTFileTest.OverlappingRanges (169291 ms)
[       OK ] ExternalSSTFileTest.IngestFileWithGlobalSeqnoRandomized (164520 ms)
[       OK ] DBWALTest.RecoverFromCorruptedWALWithoutFlush (74575 ms)
[       OK ] DBIteratorTest.PinnedDataIteratorRandomized (61493 ms)
[       OK ] FaultTest/FaultInjectionTest.FaultTest/0 (39987 ms)
[       OK ] FaultTest/FaultInjectionTest.FaultTest/1 (32871 ms)
[       OK ] DBWALTest.RollLog (27616 ms)
[       OK ] DBTestCompactionFilter.CompactionFilterWithValueChange (25138 ms)
[       OK ] DBCompactionTest.DeleteFileRange (23110 ms)

@siying
Copy link
Contributor

siying commented Feb 24, 2017

@maysamyabandeh I can work on moving some critical tests out of these suites so the remaining ones can run on MemEnv. How about that?

@maysamyabandeh
Copy link
Contributor Author

Is this related to Multi-level index?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants