-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readers: evictable_reader: skip progress guarantee when next pos is partition start #13563
readers: evictable_reader: skip progress guarantee when next pos is partition start #13563
Conversation
…artition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: scylladb#13491
CI state |
|
CI state |
|
CI state |
|
CI state |
while (next_mf && _tri_cmp(_next_position_in_partition, buffer().back().position()) <= 0) { | ||
// This loop becomes inifinite when next pos is a partition start. | ||
// In that case progress is guranteed anyway, so skip this loop entirely. | ||
while (!_next_position_in_partition.is_partition_start() && next_mf && _tri_cmp(_next_position_in_partition, buffer().back().position()) <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we access buffer().back().position()
below, right after this loop.
How do we know that the buffer isn't empty at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that be done under while (next_mf)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we access
buffer().back().position()
below, right after this loop. How do we know that the buffer isn't empty at that point?
See !is_buffer_empty()
in the enclosing if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I got confused and thought we're consuming from the buffer, but we're actually filling it.
while (next_mf && _tri_cmp(_next_position_in_partition, buffer().back().position()) <= 0) { | ||
// This loop becomes inifinite when next pos is a partition start. | ||
// In that case progress is guranteed anyway, so skip this loop entirely. | ||
while (!_next_position_in_partition.is_partition_start() && next_mf && _tri_cmp(_next_position_in_partition, buffer().back().position()) <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment. Why is it guaranteed we make progress on partition_start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making progress is only a concern due to range tombstone changes, which can have non-monotonically increasing positions. The evictable reader needs to ensure that each buffer fill, ends with a position, strictly larger than that of the previous buffer fill. When the next expected position is a partition start, this is guaranteed and need not be checked (partition start means a new partition is started, so we make partition-level progress).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Though I don't see why the current code is broken. You read the partition start and emit it, Then you read the partition end, which should have its position_in_partition greater than the partition start, which should stop the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's absolutely broken to compare position_in_partition without noticing that we changed partitions, as position_in_partition can't be compared across partitions. So I agree with the fix, just wondering why partition_end didn't save us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading an entire partition into memory is an OOM sentence if the partition is large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't see it. push_mutation_fragment() updates buffer().back().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't buffer.back() change during the loop? Ah, I guess it doesn't.
It does. And we keep changing it (by pushing new fragments) until the condition becomes false and we exit the loop. The problem is that if _next_position_in_partition
is partition_start
, now matter what we push to the buffer, _tri_cmp(_next_position_in_partition, buffer().back().position()) <= 0
will always hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I get it now. I thought that we check that the next unconsumed position is after the last consumed position, but we check against some position that isn't advanced. Thanks for bearing with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. The code is while (_next_position_in_partition <= back) advance_back...
But if this condition is supposed to check for progress against _next_position_in_partition
, shouldn't it be while (back <= (<?) _next_position_in_partition) advance_back...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. If you modify the test below to call fill_buffer()
a second time, this time it will read till the end of partition instead of reading another small batch of fragments - because the comparison is done in the wrong direction.
|
||
rd.fill_buffer().get(); | ||
auto buf1 = rd.detach_buffer(); | ||
BOOST_REQUIRE_EQUAL(buf1.size(), 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test fail before the patch with an infinite loop (or by consuming all the reader)? If not, it's just testing some detail of the implementation, not the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test fails before the patch, by finding that the reader read more than expected into the buffer. Unfortunately, there is no way to write a test for this, without involving specific details about how readers work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than expected != the failure condition.
If before the fix would stop at 4 or 2 fragments, then the test doesn't reproduce the infinite loop.
If it's really unbounded, then you can have a reader with 10 fragments and assert that not all fragments were consumed, rather than some specific number was consumed (but another number would have been just as well, as long as it's not "consume the entire stream while some data-dependent condition holds".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the fix, the reader would read all 1003 fragments of test data. After the fix, it stops after 3 fragments, which is where it should stop, according to the precalculated max buffer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I suggest to change the condition to < 10. Instead of enshrining some detail, let's check the actual failure (we can't check that the number of fragments is infinite, but 10 is a close approximation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader not reading more than what its max-buffer-size is is actually part of the reader contract. But I can increase this number so that it allows the reader deciding to read a few more fragments than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just queue it.
…artition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: #13491 Closes #13563 (cherry picked from commit 72003dc)
…artition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: #13491 Closes #13563 (cherry picked from commit 72003dc)
…artition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: #13491 Closes #13563 (cherry picked from commit 72003dc)
…artition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: #13491 Closes #13563 (cherry picked from commit 72003dc)
…artition start The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader. To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition. Fixes: #13491 Closes #13563 (cherry picked from commit 72003dc)
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…ition The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in scylladb#13491. There was already a fix in this area to handle `partition_start` fragments correctly - scylladb#13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in scylladb#13563) to detect this case. Fixes scylladb#13491
…re partition' from Kamil Braun The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491. There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in #13563) to detect this case. After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s. Fixes #13491 Closes #14375 * github.com:scylladb/scylladb: readers: evictable_reader: don't accidentally consume the entire partition test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position
…re partition' from Kamil Braun The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491. There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in #13563) to detect this case. After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s. Fixes #13491 Closes #14375 * github.com:scylladb/scylladb: readers: evictable_reader: don't accidentally consume the entire partition test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position (cherry picked from commit 586102b)
…re partition' from Kamil Braun The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491. There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in #13563) to detect this case. After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s. Fixes #13491 Closes #14375 * github.com:scylladb/scylladb: readers: evictable_reader: don't accidentally consume the entire partition test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position (cherry picked from commit 586102b)
…re partition' from Kamil Braun The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the previous buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between. The code guranteeing this forward progress had a bug: the comparison between the position after the last buffer-fill and the current last fragment position was done in the wrong direction. So if the condition that we wanted to achieve was already true, we would continue filling the buffer until partition end which may lead to OOMs such as in #13491. There was already a fix in this area to handle `partition_start` fragments correctly - #13563 - but it missed that the position comparison was done in the wrong order. Fix the comparison and adjust one of the tests (added in #13563) to detect this case. After the fix, the evictable reader starts generating some redundant (but expected) range tombstone change fragments since it's now being paused and resumed. For this we need to adjust mutation source tests which were a bit too specific. We modify `flat_mutation_reader_assertions` to squash the redundant `r_t_c`s. Fixes #13491 Closes #14375 * github.com:scylladb/scylladb: readers: evictable_reader: don't accidentally consume the entire partition test: flat_mutation_reader_assertions: squash `r_t_c`s with the same position (cherry picked from commit 586102b)
The evictable reader must ensure that each buffer fill makes forward progress, i.e. the last fragment in the buffer has a position larger than the last fragment from the last buffer-fill. Otherwise, the reader could get stuck in an infinite loop between buffer fills, if the reader is evicted in-between.
The code guranteeing this forward change has a bug: when the next expected position is a partition-start (another partition), the code would loop forever, effectively reading all there is from the underlying reader.
To avoid this, add a special case to ignore the progress guarantee loop altogether when the next expected position is a partition start. In this case, progress is garanteed anyway, because there is exactly one partition-start fragment in each partition.
Fixes: #13491