Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customized BlockBasedTableIterator and LevelIterator #3406

Closed
wants to merge 2 commits into from

Conversation

siying
Copy link
Contributor

@siying siying commented Jan 25, 2018

Summary:
Use a customzied BlockBasedTableIterator and LevelIterator to replace current implementations leveraging two-level-iterator. Hope the customized logic will make code easier to understand. As a side effect, BlockBasedTableIterator reduces the allocation for the data block iterator object, and avoid the virtual function call to it, because we can directly reference BlockIter, a final class. Similarly, LevelIterator reduces virtual function call to the dummy iterator iterating the file metadata. It also enabled further optimization.

The upper bound check is also moved from index block to data block. This implementation fits this iterator better. After the change, forwared iterator is slightly optimized to ensure we trim those iterators.

The two-level-iterator now is only used by partitioned index, so it is simplified.

Test Plan: Run all existing tests using ASAN.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request. View: changes, changes since last import

@siying siying changed the title Customized BlockBasedTableIterator Customized BlockBasedTableIterator and LevelIterator Jan 26, 2018
@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request. View: changes, changes since last import

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comment and general questions. And I haven't go through BlockBasedTableReader yet.

@@ -19,16 +19,10 @@ class InternalKeyComparator;
class Arena;

struct TwoLevelIteratorState {
explicit TwoLevelIteratorState(bool _check_prefix_may_match)
: check_prefix_may_match(_check_prefix_may_match) {}
explicit TwoLevelIteratorState() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needing explicit keyword

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -76,6 +76,8 @@ class InternalIterator : public Cleanable {
// with PinnedIteratorsManager.
virtual void SetPinnedItersMgr(PinnedIteratorsManager* pinned_iters_mgr) {}

virtual bool MayPositionAfterEnd() { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to explain what it means to position after end?

false /* don't sample compaction */));
list[num++] = new LevelIterator(
cfd->table_cache(), read_options, env_options_compactions,
cfd->internal_comparator(), c->input_levels(which), false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add comment what's the last false.

public:
// @param skip_filters Disables loading/accessing the filter block
LevelFileIteratorState(TableCache* table_cache,
explicit LevelIterator(TableCache* table_cache,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: explicit keyword not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not that uncommon, I think, to always mark constructors explicit so that the keyword's not forgotten when people add/remove arguments and it one day ends up being a single-argument constructor.

Copy link
Contributor Author

@siying siying Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug let me remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr make sense too.

SetFileIterator(nullptr);
return;
} else {
if (file_iter_.iter() != nullptr && !file_iter_.status().IsIncomplete() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when file_iter_.status() can be incomplete? Do we need to check other status?

Copy link
Contributor Author

@siying siying Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When users set read_options.read_tiers=kNoIO or something like that, when block cache miss happens, rather than reading from disk, it continues to the next block and set status to incomplete. I suspect that it was a bug in initial implementation, but I didn't change this behavior in this PR.

level_(level),
range_del_agg_(range_del_agg) {}
range_del_agg_(range_del_agg),
may_position_after_end_(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does may_position_after_end mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me remove it if it is not needed in this iterator.
I used this flag to identify that the iterator is made invalid because of upper bound check. This information is important to an upper level iterator because, unless knowing this information, the upper level will assume this is the end so that they may consider to open the next file.

// 16-byte value containing the file number and file size, both
// encoded using EncodeFixed64.
class LevelFileNumIterator : public InternalIterator {
class LevelIterator final : public InternalIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class still feel like TwoLevelIterator with some renaming. Any chance to simplify it further?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make TwoLevelIterator much simpler now. Maybe it can be made even simpler. Before TwoLevelIterator contains lots of special logic, like prefix may match, etc. Now the logic only goes to where it is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request.

@siying
Copy link
Contributor Author

siying commented Feb 2, 2018

Addressed the comments. I changed from MayPositionAfterEnd() to IsOutOfBound() and add comments about it. Hope it will be clearer.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the super delay. Only some minor comments.

skip_filters_(skip_filters),
is_index_(is_index),
block_map_(block_map) {}
: table_(table), block_map_(block_map) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the params of this constructor is unused. Can we remove them?


SavePrevIndexValue();

index_iter_->Seek(target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment on why here we call Seek not SeekForPrev? And would it be correct and more efficient to call SeekForPrev()?

bool is_out_of_bound_ = false;
bool check_filter_;
// TODO use block offset instead
std::string prev_index_value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private member should end with underscore.

}

// indicates if we are on the last page that need to be pre-fetched
bool prefetching_boundary_page = false;
void BlockBasedTableIterator::FindKeyBackward() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: there's an extra indent starting from here.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request.

Customized BlockBasedTableIterator

Summary:
Use a customzied BlockBasedTableIterator to replace current implementation leveraging two level iterator. Hope the customized logic will make code easier to understand. As a side effect, we reduce the allocation for the data block iterator object, and avoid the virtual function call to it, because we can directly reference BlockIter, a final class. It also enabled further optimization.

The upper bound check is also moved from index block to data block. This implementation fits this iterator better. After the change, forwared iterator is slightly optimized to ensure we trim those iterators.

Test Plan: Run all existing tests using ASAN.

Simplify two-level-iterator

fix clang

Fix issues because of rebase

Address comments.

Fix a memory issue
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@siying has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor Author

siying commented Feb 12, 2018

Addressed comments.

@siying
Copy link
Contributor Author

siying commented Feb 12, 2018

appveyor stucks now. But it has passed in a previous revision, and we don't have many changes since then. Land it.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor Author

siying commented Feb 13, 2018

The Travis times out in test group 2. The master branch experienced the same issue this morning, so I think it's not the problem.

state_->~TwoLevelIteratorState();
}
}
virtual ~TwoLevelIterator() { delete state_; }
Copy link
Contributor

@miasantreble miasantreble Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the new design, who's responsible for deleting first_level_iter_ and second_level_iter_?
RIght now there are memory leaks that trace back to https://github.com/facebook/rocksdb/blob/master/table/two_level_iterator.cc#L205 and https://github.com/facebook/rocksdb/blob/master/table/two_level_iterator.cc#L196

Call stack see https://gist.github.com/miasantreble/0de9bd7ee11451cb1bebbd108f7b3e80

this is happening after #3692 fixed the test to vary the level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble this seems like a bug to me.

facebook-github-bot pushed a commit that referenced this pull request Apr 16, 2018
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in #3406
2. various unused param errors introduced by #3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes #3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
maysamyabandeh pushed a commit that referenced this pull request Apr 16, 2018
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in #3406
2. various unused param errors introduced by #3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes #3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
miasantreble added a commit that referenced this pull request Apr 20, 2018
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in #3406
2. various unused param errors introduced by #3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes #3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
maysamyabandeh pushed a commit that referenced this pull request May 8, 2018
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in #3406
2. various unused param errors introduced by #3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes #3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants