-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator #12480
Conversation
6582451
to
0a9afc1
Compare
0a9afc1
to
2f211e3
Compare
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design looks great overall, thanks a lot @jaykorean ! Left some comments/questions below
d4a6df5
to
e90568e
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
// UNDER CONSTRUCTION - DO NOT USE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to lift this after adding this to stress_test
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @jaykorean !! This looks awesome!
std::function<void()> reset_func, | ||
std::function<void(ColumnFamilyHandle*, Iterator*)> populate_func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor and we can do this as a follow-up but we can avoid the overhead of std::function
by turning MultiCfIteratorImpl
into a class template that takes the types of these two functors as template parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using the template
first, but had a couple of issues that I ended up using std::function
here.
- The compiler was complaining with the following message and I still haven't figured out the reason.
./db/multi_cf_iterator_impl.h: In member function ‘void rocksdb::MultiCfIteratorImpl<ResetFuncType, PopulateFuncType>::InitMinHeap()’:
./db/multi_cf_iterator_impl.h:175:33: error: expected primary-expression before ‘>’ token
175 | heap_.emplace<MultiCfMinHeap>(
- Minor, but in
CoalescingIterator
andAttributeGroupIterator
, the impl had to contain the types of the two functions. e.g.MultiCfIteratorImpl<std::function<void()>, std::function<void(ColumnFamilyHandle*, Iterator*)>> impl_;
which I found not as neaty/readable as I wanted it to be.
e88536f
to
24f0807
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
24f0807
to
46f50a1
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean merged this pull request in 58a98bd. |
…on (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (#12422, #12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: #12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
Summary
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic
Iterator
interface.value()
/columns()
returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.attribute_groups()
API, which is not yet implemented, will not be useful for a single-CF iterator.In this PR, we are implementing the following changes:
IteratorBase
introduced, which includes all basic iterator functions exceptvalue()
andcolumns()
.Iterator
, which now inherits fromIteratorBase
, includesvalue()
andcolumns()
.AttributeGroupIterator
inherits fromIteratorBase
and additionally includesattribute_groups()
(to be implemented).MultiCfIterator
toCoalescingIterator
which inherits fromIterator
CoalescingIteratorTest
andAttributeGroupIteratorTest
.wide_columns.h
to a new file,attribute_groups.h
.Some Implementation Details
MultiCfIteratorImpl
takes two functions -populate_func
andreset_func
and use them to populatevalue_
andcolumns_
in CoalescingIterator andattribute_groups_
in AttributeGroupIterator. In CoalescingIterator, populate_func isCoalesce()
, in AttributeGroupIterator populate_func isAddToAttributeGroups()
.reset_func
clears populated value_, columns_ and attribute_groups_ accordingly.Coalesce()
merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.For example, if CF1 has
"key_1" ==> {"col_1": "foo", "col_2", "baz"}
and CF2 has"key_1" ==> {"col_2": "quux", "col_3", "bla"}
, and when the iterator is atkey_1
,columns()
will return{"col_1": "foo", "col_2", "quux", "col_3", "bla"}
In this example,
value()
will be empty, because none of them have values forkDefaultColumnName
Test Plan
Unit Test
Performance Test
To make sure this change does not impact existing
Iterator
performanceBuild
Setup
Run
Before the change
After the change