-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a clipping internal iterator #8327
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.
Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.
Interesting! Will take a look.
Here's the thing: from an inheritance standpoint, |
This might be fixable by having the lower-level iterators (DataIterator for example) be wrapped if it needs to be. Then the InternalIterator may not even need to care about the start/end slices. |
You mean wrapped in |
3b9a97e
to
87f4d6b
Compare
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
void Seek(const Slice& target) override { | ||
if (start_ && cmp_->Compare(target, *start_) < 0) { | ||
iter_->Seek(*start_); | ||
UpdateAndEnforceUpperBound(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling SeekToFirst()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for this because here we actually know which branch of SeekToFirst
we need, in other words, calling SeekToFirst
would involve a redundant check (we've already established start_
is not nullptr
).
|
||
// Upper bound is exclusive, so we need a key which is strictly smaller | ||
if (iter_->Valid() && cmp_->Compare(iter_->key(), *end_) == 0) { | ||
iter_->Prev(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether we should assert it is smaller than end key here. There were cases where we found duplicated internal keys, and we might want to catch this earlier in debug mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have such assertions: the AssertBounds
method, which is called after each iterator move, checks whether the KV is in bounds.
UpdateAndEnforceUpperBound(); | ||
} | ||
|
||
bool NextAndGetResult(IterateResult* result) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not calling iter_->NextAndGetResult(). At least add a comment explaining why not.
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
class ClippingIterator : public InternalIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment to the class to explain what the class is for.
|
||
InternalIterator* iter_; | ||
const Slice* start_; | ||
const Slice* end_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a little bit nervous on unowned Slice. We do have it in upper bound, but that was kind of a hack to allow users to change it between Seek(). If the iterator is supposed to be used in compaction, which only is only used for long scans, it feels like it is safer to make a copy to the string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer semantics provides a straightforward way here to express that these two bounds are optional (the first subcompaction has no start, and the final one has no end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the same achieved by std::unique_ptr<std::string>
?
@ltamasi has updated the pull request. You must reimport the pull request before landing. |
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
||
InternalIterator* iter_; | ||
const Slice* start_; | ||
const Slice* end_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the same achieved by std::unique_ptr<std::string>
?
…ations in ClippingIterator
…iterator's NextAndGetResult, add test
c8a241f
to
97b200f
Compare
@ltamasi has updated the pull request. You must reimport the pull request before landing. |
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary:
Logically, subcompactions process a key range [start, end); however, the way
this is currently implemented is that the
CompactionIterator
for any givensubcompaction keeps processing key-values until it actually outputs a key that
is out of range, which is then discarded. Instead of doing this, the patch
introduces a new type of internal iterator called
ClippingIterator
which wrapsanother internal iterator and "clips" its range of key-values so that any KVs
returned are strictly in the [start, end) interval. This does eliminate a (minor)
inefficiency by stopping processing in subcompactions exactly at the limit;
however, the main motivation is related to BlobDB: namely, we need this to be
able to measure the amount of garbage generated by a subcompaction
precisely and prevent off-by-one errors.
Test Plan:
make check