-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BaseDeltaIterator now adheres to ReadOptions iterate_upper_bound #7214
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks reasonable to me. I would request a few more tests with SeekToLast/SeekToPrev with a key passed and at the upper bound, and then some Prev() from there.
@mrambacher Good shout! It looks like there are some issues with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there also be tests for Seek, SeekForPrev, when seeking to the Upper Bound or beyond?
utilities/write_batch_with_index/write_batch_with_index_test.cc
Outdated
Show resolved
Hide resolved
// iter->SeekToLast(); | ||
// ASSERT_OK(iter->status()); | ||
// ASSERT_TRUE(iter->Valid()); | ||
// iter->Prev(); | ||
// ASSERT_OK(iter->status()); | ||
// ASSERT_TRUE(iter->Valid()); | ||
|
||
// AssertIter(iter.get(), "k06", "v06"); | ||
// iter->Next(); | ||
// AssertIter(iter.get(), "k07", "k07"); | ||
// iter->Next(); | ||
// ASSERT_OK(iter->status()); | ||
// ASSERT_FALSE(iter->Valid()) << "Should have reached upper_bound"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrambacher It was a work in progress.
Just to update you on progress - The approach I took initially in this PR was not strong enough, with further tests other problems emerged. I now have many more tests and just one that is now failing. As soon as I have that test passing, I will push my new approach to this PR...
89695e3
to
4abb167
Compare
@mrambacher @pdillinger Okay this is a completely reworked approach. Solving the issue turned out to be much more involved than I initially though as there were other hidden problems. The fix once known is relatively simple. This is my 5th attempt at fixing it, as I kept adding tests which would show additional problems. Let me know your thoughts... Hopefully this is pretty much done now ;-) |
// NOTE: we don't need the bounds check on | ||
// base_iterator if the base iterator has an | ||
// upper_bounds_check already | ||
return base_iterator_->Valid() && | ||
(base_iterator_->has_upper_bound() | ||
? true | ||
: IsWithinBounds(base_iterator_->key())); | ||
} | ||
bool DeltaValid() const { | ||
return delta_iterator_->Valid() && | ||
IsWithinBounds(delta_iterator_->Entry().key); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very expensive to do on every call. It seems like it would be better to do something in AdvanceDelta/Base to check if the advanced iterator is within the bounds and have that method set a flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that may be a possible optimisation...
In the first pass I wanted to focus on correctness with as little modification to the current approach as possible. As it is tricky to say the least! I would still like to see this PR merged before optimisations are considered. If there is no upper or lower bound set I don't think this PR will have much (if any) impact on current performance.
I suspect few people can have been using bounds with WBWI - otherwise like me, they would likely have noticed the bugs ;-)
I would prefer that later a subsequent PR could be created which would then add such an optimisation for when bounds are in play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SeekToPrev is still broken if k is > upper bound.
include/rocksdb/iterator.h
Outdated
@@ -123,14 +123,36 @@ class Iterator : public Cleanable { | |||
* checking ReadOptions::iterate_lower_bound, | |||
* false otherwise. | |||
*/ | |||
virtual bool has_lower_bound() const { return false; } | |||
virtual bool check_lower_bound() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused as to when/how these methods will be used. They are new and the only Iterators that implement them are the test ones (even the BaseDelta does not implement it). It seems like we might need a solution without these new methods, as all non-test solutions would effectively be without them. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly remove them. They are there to allow an optimisation where when you have an iterator wrapping an iterator (several places in Rocks) if your underlying iterator handles upper and lower bounds, you don't need to check them again.
I think it would be better to keep them, and expand the use of them. For example, ForwardIterator
and BackwardIterator
should likely implement them as they already handle upper and lower bounds checks respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the BaseDeltaIterator implement these functions?
Not to complicate your world (and I am in no way suggesting you fix this, perhaps document it) but if the BaseIterator had bounds on it but the ReadOptions did not, wouldn't you potentially get really strange results? That sounds more like something that should be an assert somewhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is only true in the test code. Outside of tests ,the "NewIteratorFromBase" takes in an Iterator, which becomes the base iterator. In this case, the base iterator could have very different bounds than the DeltaIterator.
Or am I missing something?
I am not suggesting that should be legal BTW, only if it happens I do not know what the results would look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I will add some tests for that too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is already handled, and I just added the tests: TestIteraratorWithBaseUpperBoundOnBaseNoUpperBoundOnBatch
TestIteraratorWithBaseNoUpperBoundOnBaseUpperBoundOnBatch
for further evidence of this :-)
if (!base_iterator_->check_upper_bound()) { | ||
// no, so we have to seek it to before base_upper_bound | ||
base_iterator_->Seek(*(base_upper_bound)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the case/code for base_iterator_
and delta_iterator_
should be the same. Seek to the read_options->upper_bound
. If the iterator is not valid, SeekToLast
, otherwise Prev
.
I do not think the new methods are required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow you here... the code for base_iterator_
and delta_iterator_
are the same apart from the case where the base_iterator_
might already handle upper/lower bounds.
I did look at making delta_iterator_
also handle the upper/lower bounds directly itself, but it was too difficult as it is a very different type of iterator which is really just concerned with the SkipList really.
I am fairly certain I have the logic correct as I kept adding tests to ensure this. However, of course it is indeed possible that I missed something... can you see a test case where the logic doesn't hold?
Perhaps it is easier to have a call to discuss?
@mrambacher I thought that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is getting close. I think there is a crash if the base iterator has a bounds but the delta does not -- I do not know what is expected here.
I like the new tests for SeekToPrev. I think having a test where the BaseDeltaIterator has a bounds but the inner one does not would also be helpful.
include/rocksdb/iterator.h
Outdated
@@ -123,14 +123,36 @@ class Iterator : public Cleanable { | |||
* checking ReadOptions::iterate_lower_bound, | |||
* false otherwise. | |||
*/ | |||
virtual bool has_lower_bound() const { return false; } | |||
virtual bool check_lower_bound() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the BaseDeltaIterator implement these functions?
Not to complicate your world (and I am in no way suggesting you fix this, perhaps document it) but if the BaseIterator had bounds on it but the ReadOptions did not, wouldn't you potentially get really strange results? That sounds more like something that should be an assert somewhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finalize rest of review, but looks pretty solid at first overview. :)
include/rocksdb/iterator.h
Outdated
* checking ReadOptions::iterate_lower_bound, | ||
* false otherwise. | ||
*/ | ||
virtual bool check_lower_bound() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_lower_bound
is not an existing or new stateful member, so why name it like an accessor with a verb name? How about ChecksLowerBound
or DoesCheckLowerBound
?
include/rocksdb/iterator.h
Outdated
* not mean that it is checked. This can however | ||
* be determined by calling Iterator::check_lower_bound(). | ||
*/ | ||
virtual const Slice* iterate_lower_bound() const { return nullptr; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a verb, where the reader is supposed to understand from the use of lowercase_with_underscores that this is an accessor not an action. Since we're already in an iterator, can we just call it lower_bound
?
include/rocksdb/iterator.h
Outdated
* checking ReadOptions::iterate_upper_bound, | ||
* false otherwise. | ||
*/ | ||
virtual bool check_upper_bound() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
include/rocksdb/iterator.h
Outdated
* not mean that it is checked. This can however | ||
* be determined by calling Iterator::check_upper_bound(). | ||
*/ | ||
virtual const Slice* iterate_upper_bound() const { return nullptr; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
@pdillinger I have renamed the members as you suggested. Thanks |
14c9019
to
99db651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update PR description to reflect the end state of these changes, and what follow-up work is recommended (more Iterators implement the new functions for improved performance?)
And please update the comments on new Iterator functions to say that implementations may report not checking bounds when they actually are, so callers should take a false / null return value as "maybe" instead of "definitely not".
include/rocksdb/iterator.h
Outdated
* checking ReadOptions::iterate_lower_bound, | ||
* false otherwise. | ||
*/ | ||
virtual bool ChecksLowerBound() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, another thing: Can't we just get rid of ChecksLower/UpperBound and callers can use nullness of lower/upper_bound? That should be simpler, no contract to satisfy between the two pairs of functions, and there should generally be no extra logic involved in actually returning the Slice. (It should already be allocated with lifetime of the Iterator or more.)
Just because a lower bound is present, it does not mean that it is checked.
When is that valid behavior? It seems wrong to bake that possibility into the public Iterator interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is that valid behavior? It seems wrong to bake that possibility into the public Iterator interface.
That is the current (bad) behaviour of many of the iterators in the existing code-base ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe it's easier to update those to report what their lower and upper bounds should be rather than fully updating them to check the bounds?
Still wow. Baking into a API a way to say "I have a bug and here's what it is. If you wrap me, can you please fix it?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still wow. Baking into a API a way to say "I have a bug and here's what it is. If you wrap me, can you please fix it?"
That's not what it does at all :-(
If an Iterator which is wrapped supports upper and lower bounds checks, it can inform the wrapping iterator so that the wrapping iterator need not worry about it. It won't cause any harm if they both make the checks. In the past the wrapping iterator had no way of knowing whether the underlying iterator was also making the checks, so it would always have to make the checks itself as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the use case for exposing what your lower bound is without checking it. If that's a transitional state in migrating code, is it really necessary as part of a public API? Actually, should these functions be protected instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nor do I think an iterator needs to be able to express "I would be checking bounds if I had one."
My contention is still that we only need optional upper and lower bounds. If an iterator is simply the UNION of several other iterators, its lower/upper bound would be the smallest/largest (resp.) bound of its constituents, where no bound is considered smaller/larger than any specific value. We don't have to do any addition checking to represent that UNION.
If an iterator is INTERSECTED with a range, we can ignore that additional constraint if our existing lower and upper bounds are within that range. (Performance optimization.) But if our lower/upper bound is outside of that range constraint, we have to use the lower and/or upper range bound as our refined lower/upper bound AND check that bound ourselves.
Perhaps it is helpful to think of the union and intersection roles of a wrapping iterator independently as I have described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger I think I agree with your sentiment here and personally am not a fan of these functions -- partially because they are not implemented by more of the Iterator hierarchy.
Having said that, it seems like there should be a means to tell if an Iterator should check its bounds or assume the parent iterator can do it instead. I am not sure if these functions are the proper ones to fulfill that contract. I think the Iterator paradigm should have some sort of "Filter" to allow filtering of bounds (and prefixes). This might allow some better optimizations as well. But I am not sure that a "Filter" is in scope for this PR and think it would require more thought to implement properly.
Also, shouldn't the math you stated above be the largest lower bound and smallest upper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @pdillinger and @mrambacher . The interface of iterator should be made simpler, rather than more complicated. The interface is already in the confusing part, so we should not make it even more confusing. We do have some quite tricky functions in InternalIterator for performance reasons. They are not ideal but having an internal implementation a little bit messier is relatively much more acceptable than make public interface complicated.
@pdillinger I believe I have added the comments and amended the PR description now as you requested. |
ee48786
to
0868be6
Compare
@@ -172,6 +233,18 @@ class BaseDeltaIterator : public Iterator { | |||
return delta_iterator_->status(); | |||
} | |||
|
|||
bool ChecksLowerBound() const override { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "return true"? If so, I still question why we need the ChecksLower/UpperBound functions.
If you do mean "return false," can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger I do mean false. This PR is about implementing support for the upper_bound, much of the machinery is there for the lower_bound because it is common to both upper and lower... but my focus was implementing for the upper_bound.
} | ||
|
||
bool BaseIsWithinBounds() const { | ||
const Slice* lower_bound = base_iterator_lower_bound(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems for efficiency we should check IsMovingBackward() before looking up lower bound. (Same for forward and upper bound.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pdillinger that's a good suggestion. I have now pushed an additional commit for that.
From private discussion:
It looks like @siying wrote that code pretty recently in #7200. Any help/advice Siying? |
include/rocksdb/iterator.h
Outdated
* not check the upper bound, or that it does not | ||
* report that it is performing the check. | ||
*/ | ||
virtual bool ChecksUpperBound() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain it? RocksDB's Iterator is always supposed to have upper bound checked. Did you see a bug somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find that to be the case. There are iterators which don't check upper/lower bounds, which is what triggered this PR. When you wrap an iterator, if the wrapped iterator does not implement upper/lower bounds checks, you may need to know that, so that you can implement the checks in your wrapper iterator to get the behaviour you need.
… delta may have different bounds (as per PR review from @mrambacher)
…enumeration (as per PR review from @mrambacher)
…h is larger than key space (as per PR review from @mrambacher)
…eltaIterator does not
…per_bound (as requested by @mrambacher)
…nctions from the Iterator API as requested by @rambacher
c33559a
to
1049349
Compare
…read_options at runtime
…und of read_options at runtime
…ower_bound of read_options at runtime
Implemented in #11680 |
Previously
iterate_upper_bound
set on aBaseDeltaIterator
was not correctly respected.There were no existing tests for this, so I added quite a lot of tests first.
As well as several fixes to
BaseDeltaIterator
, this introduces the functionsIterator::ChecksLowerBound
andIterator::ChecksUpperBound
these can be used by a wrapping Iterator, such asBaseDeltaIterator
, to prevent duplication the bounds check work. At present other wrapped iterators that do correct check bounds do not necessarily implement these, a future piece of work would be to have other bounds checking iterators (such asForwardIterator
andReverseIterator
) implement these.Additionally, whilst this PR also has added much of the base work for correctly implementing
iterate_lower_bound
onBaseDeltaIterator
, it's focus was that of the more commonly usediterate_upper_bound
. The author makes no claims as to the correctness ofiterate_lower_bound
.