Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a clipping internal iterator #8327

Closed
wants to merge 12 commits into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented May 24, 2021

Summary:
Logically, subcompactions process a key range [start, end); however, the way
this is currently implemented is that the CompactionIterator for any given
subcompaction keeps processing key-values until it actually outputs a key that
is out of range, which is then discarded. Instead of doing this, the patch
introduces a new type of internal iterator called ClippingIterator which wraps
another internal iterator and "clips" its range of key-values so that any KVs
returned are strictly in the [start, end) interval. This does eliminate a (minor)
inefficiency by stopping processing in subcompactions exactly at the limit;
however, the main motivation is related to BlobDB: namely, we need this to be
able to measure the amount of garbage generated by a subcompaction
precisely and prevent off-by-one errors.

Test Plan:
make check

@ltamasi ltamasi added the WIP Work in progress label May 24, 2021
@ltamasi ltamasi requested review from ajkr and siying May 24, 2021 18:00
Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

@ltamasi
Copy link
Contributor Author

ltamasi commented May 24, 2021

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Interesting! Will take a look.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

Here's the thing: from an inheritance standpoint, InternalIterator is not an Iterator; it's a separate hierarchy with a similar but not identical set of methods. For my current purposes (compaction), we definitely need this functionality for InternalIterators. At the end of the day, the input of DBIter is also an InternalIterator (wrapped by an IteratorWrapper), so the way I think we could reuse some code would be to use this class instead of implementing the clipping in DBIter itself.

@mrambacher
Copy link
Contributor

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Interesting! Will take a look.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

Here's the thing: from an inheritance standpoint, InternalIterator is not an Iterator; it's a separate hierarchy with a similar but not identical set of methods. For my current purposes (compaction), we definitely need this functionality for InternalIterators. At the end of the day, the input of DBIter is also an InternalIterator (wrapped by an IteratorWrapper), so the way I think we could reuse some code would be to use this class instead of implementing the clipping in DBIter itself.

This might be fixable by having the lower-level iterators (DataIterator for example) be wrapped if it needs to be. Then the InternalIterator may not even need to care about the start/end slices.

@ltamasi
Copy link
Contributor Author

ltamasi commented May 24, 2021

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Interesting! Will take a look.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

Here's the thing: from an inheritance standpoint, InternalIterator is not an Iterator; it's a separate hierarchy with a similar but not identical set of methods. For my current purposes (compaction), we definitely need this functionality for InternalIterators. At the end of the day, the input of DBIter is also an InternalIterator (wrapped by an IteratorWrapper), so the way I think we could reuse some code would be to use this class instead of implementing the clipping in DBIter itself.

This might be fixable by having the lower-level iterators (DataIterator for example) be wrapped if it needs to be. Then the InternalIterator may not even need to care about the start/end slices.

You mean wrapped in Iterators (e.g. your BoundedIterator)? That would go against the current design where we use InternalIterators (which use internal keys) internally and Iterators (which typically expose user keys) in the application-facing view.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ltamasi ltamasi changed the title [WIP] Add a clipping internal iterator Add a clipping internal iterator May 27, 2021
@ltamasi ltamasi removed the WIP Work in progress label May 27, 2021
void Seek(const Slice& target) override {
if (start_ && cmp_->Compare(target, *start_) < 0) {
iter_->Seek(*start_);
UpdateAndEnforceUpperBound();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling SeekToFirst()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for this because here we actually know which branch of SeekToFirst we need, in other words, calling SeekToFirst would involve a redundant check (we've already established start_ is not nullptr).


// Upper bound is exclusive, so we need a key which is strictly smaller
if (iter_->Valid() && cmp_->Compare(iter_->key(), *end_) == 0) {
iter_->Prev();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether we should assert it is smaller than end key here. There were cases where we found duplicated internal keys, and we might want to catch this earlier in debug mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have such assertions: the AssertBounds method, which is called after each iterator move, checks whether the KV is in bounds.

UpdateAndEnforceUpperBound();
}

bool NextAndGetResult(IterateResult* result) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calling iter_->NextAndGetResult(). At least add a comment explaining why not.


namespace ROCKSDB_NAMESPACE {

class ClippingIterator : public InternalIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to the class to explain what the class is for.


InternalIterator* iter_;
const Slice* start_;
const Slice* end_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a little bit nervous on unowned Slice. We do have it in upper bound, but that was kind of a hack to allow users to change it between Seek(). If the iterator is supposed to be used in compaction, which only is only used for long scans, it feels like it is safer to make a copy to the string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer semantics provides a straightforward way here to express that these two bounds are optional (the first subcompaction has no start, and the final one has no end).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the same achieved by std::unique_ptr<std::string>?

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.


InternalIterator* iter_;
const Slice* start_;
const Slice* end_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the same achieved by std::unique_ptr<std::string>?

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ltamasi merged this pull request in db325a5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants