Add a clipping internal iterator #8327

ltamasi · 2021-05-24T18:00:07Z

Summary:
Logically, subcompactions process a key range [start, end); however, the way
this is currently implemented is that the CompactionIterator for any given
subcompaction keeps processing key-values until it actually outputs a key that
is out of range, which is then discarded. Instead of doing this, the patch
introduces a new type of internal iterator called ClippingIterator which wraps
another internal iterator and "clips" its range of key-values so that any KVs
returned are strictly in the [start, end) interval. This does eliminate a (minor)
inefficiency by stopping processing in subcompactions exactly at the limit;
however, the main motivation is related to BlobDB: namely, we need this to be
able to measure the amount of garbage generated by a subcompaction
precisely and prevent off-by-one errors.

Test Plan:
make check

mrambacher

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

ltamasi · 2021-05-24T20:32:49Z

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Interesting! Will take a look.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

Here's the thing: from an inheritance standpoint, InternalIterator is not an Iterator; it's a separate hierarchy with a similar but not identical set of methods. For my current purposes (compaction), we definitely need this functionality for InternalIterators. At the end of the day, the input of DBIter is also an InternalIterator (wrapped by an IteratorWrapper), so the way I think we could reuse some code would be to use this class instead of implementing the clipping in DBIter itself.

mrambacher · 2021-05-24T21:05:18Z

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Interesting! Will take a look.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

Here's the thing: from an inheritance standpoint, InternalIterator is not an Iterator; it's a separate hierarchy with a similar but not identical set of methods. For my current purposes (compaction), we definitely need this functionality for InternalIterators. At the end of the day, the input of DBIter is also an InternalIterator (wrapped by an IteratorWrapper), so the way I think we could reuse some code would be to use this class instead of implementing the clipping in DBIter itself.

This might be fixable by having the lower-level iterators (DataIterator for example) be wrapped if it needs to be. Then the InternalIterator may not even need to care about the start/end slices.

ltamasi · 2021-05-24T21:54:12Z

I last week did something almost identical to this in #8309. Instead of using an InternalIterator, I based mine on a standard Iterator. My implementation was based on the requirements for #7214 that @adamretter requires.

Interesting! Will take a look.

Can we somehow resolve the differences and end up with one implementation? I do not believe the WBWI code has an InternalIterator.

Here's the thing: from an inheritance standpoint, InternalIterator is not an Iterator; it's a separate hierarchy with a similar but not identical set of methods. For my current purposes (compaction), we definitely need this functionality for InternalIterators. At the end of the day, the input of DBIter is also an InternalIterator (wrapped by an IteratorWrapper), so the way I think we could reuse some code would be to use this class instead of implementing the clipping in DBIter itself.

This might be fixable by having the lower-level iterators (DataIterator for example) be wrapped if it needs to be. Then the InternalIterator may not even need to care about the start/end slices.

You mean wrapped in Iterators (e.g. your BoundedIterator)? That would go against the current design where we use InternalIterators (which use internal keys) internally and Iterators (which typically expose user keys) in the application-facing view.

facebook-github-bot · 2021-05-27T22:11:44Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2021-06-02T18:24:22Z

db/compaction/clipping_iterator.h

+  void Seek(const Slice& target) override {
+    if (start_ && cmp_->Compare(target, *start_) < 0) {
+      iter_->Seek(*start_);
+      UpdateAndEnforceUpperBound();


Calling SeekToFirst()?

I opted for this because here we actually know which branch of SeekToFirst we need, in other words, calling SeekToFirst would involve a redundant check (we've already established start_ is not nullptr).

siying · 2021-06-02T18:25:42Z

db/compaction/clipping_iterator.h

+
+      // Upper bound is exclusive, so we need a key which is strictly smaller
+      if (iter_->Valid() && cmp_->Compare(iter_->key(), *end_) == 0) {
+        iter_->Prev();


Not sure whether we should assert it is smaller than end key here. There were cases where we found duplicated internal keys, and we might want to catch this earlier in debug mode.

We do have such assertions: the AssertBounds method, which is called after each iterator move, checks whether the KV is in bounds.

siying · 2021-06-02T18:26:41Z

db/compaction/clipping_iterator.h

+    UpdateAndEnforceUpperBound();
+  }
+
+  bool NextAndGetResult(IterateResult* result) override {


Why not calling iter_->NextAndGetResult(). At least add a comment explaining why not.

siying · 2021-06-02T18:27:12Z

db/compaction/clipping_iterator.h

+
+namespace ROCKSDB_NAMESPACE {
+
+class ClippingIterator : public InternalIterator {


Add comment to the class to explain what the class is for.

siying · 2021-06-02T18:32:18Z

db/compaction/clipping_iterator.h

+
+  InternalIterator* iter_;
+  const Slice* start_;
+  const Slice* end_;


I feel a little bit nervous on unowned Slice. We do have it in upper bound, but that was kind of a hack to allow users to change it between Seek(). If the iterator is supposed to be used in compaction, which only is only used for long scans, it feels like it is safer to make a copy to the string here.

Pointer semantics provides a straightforward way here to express that these two bounds are optional (the first subcompaction has no start, and the final one has no end).

Can the same achieved by std::unique_ptr<std::string>?

facebook-github-bot · 2021-06-03T18:17:23Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-06-03T18:17:36Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2021-06-09T18:40:19Z

db/compaction/clipping_iterator.h

+
+  InternalIterator* iter_;
+  const Slice* start_;
+  const Slice* end_;


Can the same achieved by std::unique_ptr<std::string>?

…ations in ClippingIterator

…iterator's NextAndGetResult, add test

facebook-github-bot · 2021-06-09T19:41:38Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-06-09T19:41:52Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-09T22:41:26Z

@ltamasi merged this pull request in db325a5.

ltamasi added the WIP Work in progress label May 24, 2021

ltamasi requested review from ajkr and siying May 24, 2021 18:00

facebook-github-bot added the CLA Signed label May 24, 2021

mrambacher reviewed May 24, 2021

View reviewed changes

ltamasi force-pushed the clipping_iterator branch from 3b9a97e to 87f4d6b Compare May 27, 2021 22:09

ltamasi changed the title ~~[WIP] Add a clipping internal iterator~~ Add a clipping internal iterator May 27, 2021

ltamasi removed the WIP Work in progress label May 27, 2021

siying reviewed Jun 2, 2021

View reviewed changes

siying approved these changes Jun 9, 2021

View reviewed changes

ltamasi added 12 commits June 9, 2021 12:27

Implement ClippingIterator

c8c1d31

Set the correct sequence number for the upper bound

14782b0

Use correct value type for upper bound

e708354

Clean up assertion

75a6d94

Improve const correctness

d979f70

Add a unit test

5fdc898

Add a BoundsCheckingVectorIterator so we can test the related optimiz…

02e1673

…ations in ClippingIterator

Add missing #include

ae434a5

Small cleanup

f667b75

Add a class comment

723a274

Implement ClippingIterator::NextAndGetResult in terms of the wrapped …

9336fad

…iterator's NextAndGetResult, add test

Add assertions re: MaybeOutOfLowerBound/UpperBoundCheckResult

97b200f

ltamasi force-pushed the clipping_iterator branch from c8a241f to 97b200f Compare June 9, 2021 19:41

facebook-github-bot closed this in db325a5 Jun 9, 2021

facebook-github-bot added the Merged label Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a clipping internal iterator #8327

Add a clipping internal iterator #8327

ltamasi commented May 24, 2021 •

edited

Loading

mrambacher left a comment

ltamasi commented May 24, 2021

mrambacher commented May 24, 2021

ltamasi commented May 24, 2021

facebook-github-bot commented May 27, 2021

siying Jun 2, 2021

ltamasi Jun 3, 2021

siying Jun 2, 2021

ltamasi Jun 3, 2021

siying Jun 2, 2021

siying Jun 2, 2021

siying Jun 2, 2021

ltamasi Jun 3, 2021

siying Jun 9, 2021

facebook-github-bot commented Jun 3, 2021

facebook-github-bot commented Jun 3, 2021

siying Jun 9, 2021

facebook-github-bot commented Jun 9, 2021

facebook-github-bot commented Jun 9, 2021

facebook-github-bot commented Jun 9, 2021


		namespace ROCKSDB_NAMESPACE {

		class ClippingIterator : public InternalIterator {

Add a clipping internal iterator #8327

Add a clipping internal iterator #8327

Conversation

ltamasi commented May 24, 2021 • edited Loading

mrambacher left a comment

Choose a reason for hiding this comment

ltamasi commented May 24, 2021

mrambacher commented May 24, 2021

ltamasi commented May 24, 2021

facebook-github-bot commented May 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 3, 2021

facebook-github-bot commented Jun 3, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Jun 9, 2021

facebook-github-bot commented Jun 9, 2021

facebook-github-bot commented Jun 9, 2021

ltamasi commented May 24, 2021 •

edited

Loading