Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BaseDeltaIterator now adheres to ReadOptions iterate_upper_bound #7214

Closed
wants to merge 33 commits into from

Conversation

adamretter
Copy link
Collaborator

@adamretter adamretter commented Aug 2, 2020

Previously iterate_upper_bound set on a BaseDeltaIterator was not correctly respected.

There were no existing tests for this, so I added quite a lot of tests first.

As well as several fixes to BaseDeltaIterator, this introduces the functions Iterator::ChecksLowerBound and Iterator::ChecksUpperBound these can be used by a wrapping Iterator, such as BaseDeltaIterator, to prevent duplication the bounds check work. At present other wrapped iterators that do correct check bounds do not necessarily implement these, a future piece of work would be to have other bounds checking iterators (such as ForwardIterator and ReverseIterator) implement these.

Additionally, whilst this PR also has added much of the base work for correctly implementing iterate_lower_bound on BaseDeltaIterator, it's focus was that of the more commonly used iterate_upper_bound. The author makes no claims as to the correctness of iterate_lower_bound.

@adamretter adamretter added the bug Confirmed RocksDB bugs label Aug 2, 2020
Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable to me. I would request a few more tests with SeekToLast/SeekToPrev with a key passed and at the upper bound, and then some Prev() from there.

@adamretter
Copy link
Collaborator Author

@mrambacher Good shout! It looks like there are some issues with seekToLast as well :-( I will look into them...

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be tests for Seek, SeekForPrev, when seeking to the Upper Bound or beyond?

include/rocksdb/utilities/write_batch_with_index.h Outdated Show resolved Hide resolved
Comment on lines 1080 to 1092
// iter->SeekToLast();
// ASSERT_OK(iter->status());
// ASSERT_TRUE(iter->Valid());
// iter->Prev();
// ASSERT_OK(iter->status());
// ASSERT_TRUE(iter->Valid());

// AssertIter(iter.get(), "k06", "v06");
// iter->Next();
// AssertIter(iter.get(), "k07", "k07");
// iter->Next();
// ASSERT_OK(iter->status());
// ASSERT_FALSE(iter->Valid()) << "Should have reached upper_bound";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrambacher It was a work in progress.

Just to update you on progress - The approach I took initially in this PR was not strong enough, with further tests other problems emerged. I now have many more tests and just one that is now failing. As soon as I have that test passing, I will push my new approach to this PR...

@adamretter adamretter marked this pull request as draft August 8, 2020 20:56
@adamretter adamretter marked this pull request as ready for review August 11, 2020 18:12
@adamretter
Copy link
Collaborator Author

@mrambacher @pdillinger Okay this is a completely reworked approach. Solving the issue turned out to be much more involved than I initially though as there were other hidden problems. The fix once known is relatively simple. This is my 5th attempt at fixing it, as I kept adding tests which would show additional problems.

Let me know your thoughts... Hopefully this is pretty much done now ;-)

utilities/write_batch_with_index/write_batch_with_index.cc Outdated Show resolved Hide resolved
Comment on lines 307 to 318
// NOTE: we don't need the bounds check on
// base_iterator if the base iterator has an
// upper_bounds_check already
return base_iterator_->Valid() &&
(base_iterator_->has_upper_bound()
? true
: IsWithinBounds(base_iterator_->key()));
}
bool DeltaValid() const {
return delta_iterator_->Valid() &&
IsWithinBounds(delta_iterator_->Entry().key);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very expensive to do on every call. It seems like it would be better to do something in AdvanceDelta/Base to check if the advanced iterator is within the bounds and have that method set a flag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that may be a possible optimisation...

In the first pass I wanted to focus on correctness with as little modification to the current approach as possible. As it is tricky to say the least! I would still like to see this PR merged before optimisations are considered. If there is no upper or lower bound set I don't think this PR will have much (if any) impact on current performance.

I suspect few people can have been using bounds with WBWI - otherwise like me, they would likely have noticed the bugs ;-)

I would prefer that later a subsequent PR could be created which would then add such an optimisation for when bounds are in play.

utilities/write_batch_with_index/write_batch_with_index.cc Outdated Show resolved Hide resolved
utilities/write_batch_with_index/write_batch_with_index.cc Outdated Show resolved Hide resolved
utilities/write_batch_with_index/write_batch_with_index.cc Outdated Show resolved Hide resolved
utilities/write_batch_with_index/write_batch_with_index.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SeekToPrev is still broken if k is > upper bound.

@@ -123,14 +123,36 @@ class Iterator : public Cleanable {
* checking ReadOptions::iterate_lower_bound,
* false otherwise.
*/
virtual bool has_lower_bound() const { return false; }
virtual bool check_lower_bound() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused as to when/how these methods will be used. They are new and the only Iterators that implement them are the test ones (even the BaseDelta does not implement it). It seems like we might need a solution without these new methods, as all non-test solutions would effectively be without them. Or am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly remove them. They are there to allow an optimisation where when you have an iterator wrapping an iterator (several places in Rocks) if your underlying iterator handles upper and lower bounds, you don't need to check them again.

I think it would be better to keep them, and expand the use of them. For example, ForwardIterator and BackwardIterator should likely implement them as they already handle upper and lower bounds checks respectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the BaseDeltaIterator implement these functions?

Not to complicate your world (and I am in no way suggesting you fix this, perhaps document it) but if the BaseIterator had bounds on it but the ReadOptions did not, wouldn't you potentially get really strange results? That sounds more like something that should be an assert somewhere...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is only true in the test code. Outside of tests ,the "NewIteratorFromBase" takes in an Iterator, which becomes the base iterator. In this case, the base iterator could have very different bounds than the DeltaIterator.

Or am I missing something?

I am not suggesting that should be legal BTW, only if it happens I do not know what the results would look like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I will add some tests for that too...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is already handled, and I just added the tests: TestIteraratorWithBaseUpperBoundOnBaseNoUpperBoundOnBatch TestIteraratorWithBaseNoUpperBoundOnBaseUpperBoundOnBatch for further evidence of this :-)

Comment on lines 67 to 69
if (!base_iterator_->check_upper_bound()) {
// no, so we have to seek it to before base_upper_bound
base_iterator_->Seek(*(base_upper_bound));
Copy link
Contributor

@mrambacher mrambacher Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case/code for base_iterator_ and delta_iterator_ should be the same. Seek to the read_options->upper_bound. If the iterator is not valid, SeekToLast, otherwise Prev.

I do not think the new methods are required

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow you here... the code for base_iterator_ and delta_iterator_ are the same apart from the case where the base_iterator_ might already handle upper/lower bounds.
I did look at making delta_iterator_ also handle the upper/lower bounds directly itself, but it was too difficult as it is a very different type of iterator which is really just concerned with the SkipList really.

I am fairly certain I have the logic correct as I kept adding tests to ensure this. However, of course it is indeed possible that I missed something... can you see a test case where the logic doesn't hold?

Perhaps it is easier to have a call to discuss?

@adamretter
Copy link
Collaborator Author

adamretter commented Aug 12, 2020

I think SeekToPrev is still broken if k is > upper bound.

@mrambacher I thought that the upper_bound should not be considered in SeekToPrev, because that would create a backward iterator, and upper_bound is only considered for a forward iterator according to the docs.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting close. I think there is a crash if the base iterator has a bounds but the delta does not -- I do not know what is expected here.

I like the new tests for SeekToPrev. I think having a test where the BaseDeltaIterator has a bounds but the inner one does not would also be helpful.

@@ -123,14 +123,36 @@ class Iterator : public Cleanable {
* checking ReadOptions::iterate_lower_bound,
* false otherwise.
*/
virtual bool has_lower_bound() const { return false; }
virtual bool check_lower_bound() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the BaseDeltaIterator implement these functions?

Not to complicate your world (and I am in no way suggesting you fix this, perhaps document it) but if the BaseIterator had bounds on it but the ReadOptions did not, wouldn't you potentially get really strange results? That sounds more like something that should be an assert somewhere...

utilities/write_batch_with_index/write_batch_with_index.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finalize rest of review, but looks pretty solid at first overview. :)

* checking ReadOptions::iterate_lower_bound,
* false otherwise.
*/
virtual bool check_lower_bound() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_lower_bound is not an existing or new stateful member, so why name it like an accessor with a verb name? How about ChecksLowerBound or DoesCheckLowerBound?

* not mean that it is checked. This can however
* be determined by calling Iterator::check_lower_bound().
*/
virtual const Slice* iterate_lower_bound() const { return nullptr; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a verb, where the reader is supposed to understand from the use of lowercase_with_underscores that this is an accessor not an action. Since we're already in an iterator, can we just call it lower_bound?

* checking ReadOptions::iterate_upper_bound,
* false otherwise.
*/
virtual bool check_upper_bound() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

* not mean that it is checked. This can however
* be determined by calling Iterator::check_upper_bound().
*/
virtual const Slice* iterate_upper_bound() const { return nullptr; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@adamretter
Copy link
Collaborator Author

@pdillinger I have renamed the members as you suggested. Thanks

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update PR description to reflect the end state of these changes, and what follow-up work is recommended (more Iterators implement the new functions for improved performance?)

And please update the comments on new Iterator functions to say that implementations may report not checking bounds when they actually are, so callers should take a false / null return value as "maybe" instead of "definitely not".

* checking ReadOptions::iterate_lower_bound,
* false otherwise.
*/
virtual bool ChecksLowerBound() const { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, another thing: Can't we just get rid of ChecksLower/UpperBound and callers can use nullness of lower/upper_bound? That should be simpler, no contract to satisfy between the two pairs of functions, and there should generally be no extra logic involved in actually returning the Slice. (It should already be allocated with lifetime of the Iterator or more.)

Just because a lower bound is present, it does not mean that it is checked.

When is that valid behavior? It seems wrong to bake that possibility into the public Iterator interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is that valid behavior? It seems wrong to bake that possibility into the public Iterator interface.

That is the current (bad) behaviour of many of the iterators in the existing code-base ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe it's easier to update those to report what their lower and upper bounds should be rather than fully updating them to check the bounds?

Still wow. Baking into a API a way to say "I have a bug and here's what it is. If you wrap me, can you please fix it?"

Copy link
Collaborator Author

@adamretter adamretter Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wow. Baking into a API a way to say "I have a bug and here's what it is. If you wrap me, can you please fix it?"

That's not what it does at all :-(

If an Iterator which is wrapped supports upper and lower bounds checks, it can inform the wrapping iterator so that the wrapping iterator need not worry about it. It won't cause any harm if they both make the checks. In the past the wrapping iterator had no way of knowing whether the underlying iterator was also making the checks, so it would always have to make the checks itself as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the use case for exposing what your lower bound is without checking it. If that's a transitional state in migrating code, is it really necessary as part of a public API? Actually, should these functions be protected instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor do I think an iterator needs to be able to express "I would be checking bounds if I had one."

My contention is still that we only need optional upper and lower bounds. If an iterator is simply the UNION of several other iterators, its lower/upper bound would be the smallest/largest (resp.) bound of its constituents, where no bound is considered smaller/larger than any specific value. We don't have to do any addition checking to represent that UNION.

If an iterator is INTERSECTED with a range, we can ignore that additional constraint if our existing lower and upper bounds are within that range. (Performance optimization.) But if our lower/upper bound is outside of that range constraint, we have to use the lower and/or upper range bound as our refined lower/upper bound AND check that bound ourselves.

Perhaps it is helpful to think of the union and intersection roles of a wrapping iterator independently as I have described.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger I think I agree with your sentiment here and personally am not a fan of these functions -- partially because they are not implemented by more of the Iterator hierarchy.

Having said that, it seems like there should be a means to tell if an Iterator should check its bounds or assume the parent iterator can do it instead. I am not sure if these functions are the proper ones to fulfill that contract. I think the Iterator paradigm should have some sort of "Filter" to allow filtering of bounds (and prefixes). This might allow some better optimizations as well. But I am not sure that a "Filter" is in scope for this PR and think it would require more thought to implement properly.

Also, shouldn't the math you stated above be the largest lower bound and smallest upper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pdillinger and @mrambacher . The interface of iterator should be made simpler, rather than more complicated. The interface is already in the confusing part, so we should not make it even more confusing. We do have some quite tricky functions in InternalIterator for performance reasons. They are not ideal but having an internal implementation a little bit messier is relatively much more acceptable than make public interface complicated.

@adamretter
Copy link
Collaborator Author

@pdillinger I believe I have added the comments and amended the PR description now as you requested.

@adamretter adamretter force-pushed the wbwi-iterator-upper-bound branch 2 times, most recently from ee48786 to 0868be6 Compare September 15, 2020 16:06
@@ -172,6 +233,18 @@ class BaseDeltaIterator : public Iterator {
return delta_iterator_->status();
}

bool ChecksLowerBound() const override { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "return true"? If so, I still question why we need the ChecksLower/UpperBound functions.

If you do mean "return false," can you explain?

Copy link
Collaborator Author

@adamretter adamretter Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdillinger I do mean false. This PR is about implementing support for the upper_bound, much of the machinery is there for the lower_bound because it is common to both upper and lower... but my focus was implementing for the upper_bound.

}

bool BaseIsWithinBounds() const {
const Slice* lower_bound = base_iterator_lower_bound();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems for efficiency we should check IsMovingBackward() before looking up lower bound. (Same for forward and upper bound.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdillinger that's a good suggestion. I have now pushed an additional commit for that.

@pdillinger
Copy link
Contributor

From private discussion:

(me) I'm still concerned about "polluting" the public Iterator API with implementation concerns.

(Adam) ... I also just noticed that there is an interface InternalIteratorBase which has at least one method that looks very similar to what I am trying to do: UpperBoundCheckResult . However it does not extend Iterator, ...

It looks like @siying wrote that code pretty recently in #7200. Any help/advice Siying?

* not check the upper bound, or that it does not
* report that it is performing the check.
*/
virtual bool ChecksUpperBound() const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain it? RocksDB's Iterator is always supposed to have upper bound checked. Did you see a bug somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find that to be the case. There are iterators which don't check upper/lower bounds, which is what triggered this PR. When you wrap an iterator, if the wrapped iterator does not implement upper/lower bounds checks, you may need to know that, so that you can implement the checks in your wrapper iterator to get the behaviour you need.

@cbi42
Copy link
Member

cbi42 commented Nov 21, 2023

Implemented in #11680

@cbi42 cbi42 closed this Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed RocksDB bugs CLA Signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants