Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: assert no inflight writes when closing segment #17032

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Mar 13, 2024

Register the inflight write before calling append on the segment. Append
may dispatch a background write and a subsequent call to
maybe_advance_stable_offset before the append continuation may run.

This was happening in CI when running with a RAM disk (very fast
storage).

When appender advances stable offset, a callback for advancing stable
stable offset at segment level is also called,
segment::advance_stable_offset. This method uses the inflight writes
to determine how far the segment stable offset can be advanced. Because
the last inflight write wasn't registered yet, the stable offset can get
stuck.

In theory, because we use the segment stable offset as reader upper
bound, this can lead to a situation where acked data is not visible and
worse consequences.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

Register the inflight write before calling append on the segment. Append
may dispatch a background write and a subsequent call to
`maybe_advance_stable_offset` before the append continuation may run.

This was happening in CI when running with a RAM disk (very fast
storage).

When appender advances stable offset, a callback for advancing stable
stable offset at segment level is also called,
`segment::advance_stable_offset`. This method uses the inflight writes
to determine how far the segment stable offset can be advanced. Because
the last inflight write wasn't registered yet, the stable offset can get
stuck.

In theory, because we use the segment stable offset as reader upper
bound, this can lead to a situation where acked data is not visible and
worse consequences.
@@ -96,6 +96,8 @@ ss::future<> segment::close() {
co_await do_flush();
co_await do_close();

vassert(_inflight.empty(), "Closing segment with inflight writes.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem if the appender continues to live on and complete the inflight writes? Here inflight seems to only be tracking the inflight writes, not owning them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did acquire the write lock above, we did flush too. Why would we expect outstanding inflight writes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I didn't notice that you had the fix below.

@dotnwat dotnwat requested a review from andrwng March 13, 2024 22:29
@vbotbuildovich
Copy link
Collaborator

Comment on lines +525 to +529
_inflight.emplace(expected_end_physical, b.last_offset());

// proxy serialization to segment_appender
auto write_fut = _appender->append(b).then(
[this, &b, start_physical_offset] {
[this, &b, start_physical_offset, expected_end_physical] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When appender advances stable offset, a callback for advancing stable
stable offset at segment level is also called,
segment::advance_stable_offset. This method uses the inflight writes
to determine how far the segment stable offset can be advanced.

Ahh, I see now. Great find. I'm very surprised this never happened in Debug builds where the scheduler randomizes the order of execution of runnable continuations, and that it took very fast storage to trigger.

Because
the last inflight write wasn't registered yet, the stable offset can get
stuck.

This part is pretty cool, that normally we could drop the upcall, but it would only really matter if no new appends arrived promptly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very surprised this never happened in Debug builds

+1 also I'm wondering if this is somehow more likely with the new dirty tracking logic in the batch cache, though nothing really jumps out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it was masked by code in segment::do_flush

ss::future<> segment::do_flush() {
    _generation_id++;
    if (!_appender) {
        return ss::make_ready_future<>();
    }
    auto o = _tracker.dirty_offset;
    auto fsize = _appender->file_byte_offset();
    return _appender->flush().then([this, o, fsize] {
        // never move committed offset backward, there may be multiple
        // outstanding flushes once the one executed later in terms of offset
        // finishes we guarantee that all previous flushes finished.
        _tracker.committed_offset = std::max(o, _tracker.committed_offset);
        // ########## HERE ##########
        _tracker.stable_offset = _tracker.committed_offset;
        // ########## HERE ##########
        _reader->set_file_size(std::max(fsize, _reader->file_size()));
        clear_cached_disk_usage();
    });
}

However, very important!, this also does contain a bug and with concurrent flushes data can become missing #17039

@@ -96,6 +96,8 @@ ss::future<> segment::close() {
co_await do_flush();
co_await do_close();

vassert(_inflight.empty(), "Closing segment with inflight writes.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I didn't notice that you had the fix below.

@dotnwat
Copy link
Member

dotnwat commented Mar 14, 2024

@nvartolomei needs a cover letter too.

Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! We really ought to keep a diagram of sorts around to keep track of control flow of the append path and all its background work...

Is this not marked for backport because the chances of a reader being unable to read acked data is very low (even theoretical only, maybe)?

@@ -96,6 +96,8 @@ ss::future<> segment::close() {
co_await do_flush();
co_await do_close();

vassert(_inflight.empty(), "Closing segment with inflight writes.");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we use the segment stable offset as reader upper
bound, this can lead to a situation where acked data is not visible and
worse consequences.

Just making sure, even though we're adding to _inflight before appending, we aren't at risk for having readers read that data though right (especially important since the append may not have been dispatched)? It's just that now, advance_stable_offset will at least be able to consider the in-flight write even if we haven't called append() yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, advance_stable_offset operates on file offsets which were written and inflight entries which are at or below that offset.

Comment on lines +525 to +529
_inflight.emplace(expected_end_physical, b.last_offset());

// proxy serialization to segment_appender
auto write_fut = _appender->append(b).then(
[this, &b, start_physical_offset] {
[this, &b, start_physical_offset, expected_end_physical] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very surprised this never happened in Debug builds

+1 also I'm wondering if this is somehow more likely with the new dirty tracking logic in the batch cache, though nothing really jumps out.

@nvartolomei
Copy link
Contributor Author

/cdt

nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Mar 14, 2024
Dirty entries are ones that aren't eligible for eviction as they might
have not yet been written durably to disk. Evicting them would otherwise
lead to "disappeared" data on cache miss.

We are making an implicit assumption that readers that want to read
beyond storage stable offset will always read through the batch cache.
Otherwise, the read-after-write semantics won't hold as the data
buffered in appender chunk cache isn't exposed.

---

Batch cache entries are `batch_cache::range` data structures. A range is
a 32KiB memory arena with some ancillary data structures. It is
associated with a `batch_cache_index`. `batch_cache_index` has 1-to-1
mapping with a segment. When a batch is put in the cache, the
`batch_cache_index` will append to an existing range or allocate a new
one and insert it into the `batch_cache`.

We propose to track dirty batches at `batch_cache::range` granularity
rather than per batch.

Using the insight that dirty batches are inserted with monotonically
increasing offsets, and the fact that we always mark as clean (clear the
dirty flag) for a prefix of offsets, it is enough to track only the
maximum dirty batch offset in a range and advance it when new dirty
batches are inserted. For marking offsets as clean (on flush) we check
if `clean_offset >= max_dirty_offset` and mark the range as clean. In
the alternative case, nothing needs to be done.

A `batch_cache_index` can be associated with multiple ranges which
potentially need to be marked as clean. As an optimization, we propose
to track which ranges might contain dirty data and when the request for
marking a prefix of offsets comes in to iterate only through this subset
of ranges. We will use this information together with the already
existing index which maps offsets to ranges
`absl::btree_map<model::offset, batch_cache::entry>` to iterate only
through the entries and ranges which contain the dirty offsets we are
interested in. The complexity of this iteration is O(N) where N is the
number of dirty batches that must be marked as clean.

To this end, we are introducing a new data structure called
`batch_cache_dirty_tracker` to keep track of minimum and maximum dirty
offsets in an index that we need for the optimization.

Technically, we could achieve a similar performance optimization by
doing a binary search over the index using desired mark clean `up_to`
offset as a needle and iterate backwards through ranges marking them as
clean and stopping once we hit a clean range. However, the new data
structure has some additional benefits:

This data structure facilitates some invariants checks:
  - Dirty offsets are monotonic: `vassert(_max <= range.first && _max <
    range.second, ...)`
  - Truncation is requested only with a clean index:
    `vassert(_dirty_tracker.clean(), ...)`. See the next section for why
    this is important.
  - At the end of batch cache / batch cache index lifetime there are no
    dirty batches left in the index. This already caught 2 bugs in the
    existing code:
      1) redpanda-data#17032
      2) redpanda-data#17039

Also, it improves the debugging experience as we include the state of
the dirty tracker in the `batch_cache_index` ostream insertion operator.

---

Truncation needs special attention too. When truncation is requested we
must evict batches with offsets higher than truncation point but still
retain dirty batches with smaller offsets. Since we can't have holes in
the ranges and can evict only at the range granularity we require a
flush prior truncation. It is already present so no changes required.
@nvartolomei
Copy link
Contributor Author

nvartolomei commented Mar 14, 2024

Known cdt failures:

Unrelated rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-1-9(1) example="ERROR 2024-03-14 11:55:32,224 [shard 0:au ] s3 - s3_client.cc:469 - Accessing panda-bucket-b3a2fc8c-e1f8-11ee-8182-c3cdd68eb941, unexpected REST API error "" detected, code: RequestTimeout, request_id: E6840FAB839FF7F5, resource:"> #17101

@dotnwat dotnwat merged commit 6d0a7bf into redpanda-data:dev Mar 14, 2024
15 of 18 checks passed
nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Mar 14, 2024
Dirty entries are ones that aren't eligible for eviction as they might
have not yet been written durably to disk. Evicting them would otherwise
lead to "disappeared" data on cache miss.

We are making an implicit assumption that readers that want to read
beyond storage stable offset will always read through the batch cache.
Otherwise, the read-after-write semantics won't hold as the data
buffered in appender chunk cache isn't exposed.

---

Batch cache entries are `batch_cache::range` data structures. A range is
a 32KiB memory arena with some ancillary data structures. It is
associated with a `batch_cache_index`. `batch_cache_index` has 1-to-1
mapping with a segment. When a batch is put in the cache, the
`batch_cache_index` will append to an existing range or allocate a new
one and insert it into the `batch_cache`.

We propose to track dirty batches at `batch_cache::range` granularity
rather than per batch.

Using the insight that dirty batches are inserted with monotonically
increasing offsets, and the fact that we always mark as clean (clear the
dirty flag) for a prefix of offsets, it is enough to track only the
maximum dirty batch offset in a range and advance it when new dirty
batches are inserted. For marking offsets as clean (on flush) we check
if `clean_offset >= max_dirty_offset` and mark the range as clean. In
the alternative case, nothing needs to be done.

A `batch_cache_index` can be associated with multiple ranges which
potentially need to be marked as clean. As an optimization, we propose
to track which ranges might contain dirty data and when the request for
marking a prefix of offsets comes in to iterate only through this subset
of ranges. We will use this information together with the already
existing index which maps offsets to ranges
`absl::btree_map<model::offset, batch_cache::entry>` to iterate only
through the entries and ranges which contain the dirty offsets we are
interested in. The complexity of this iteration is O(N) where N is the
number of dirty batches that must be marked as clean.

To this end, we are introducing a new data structure called
`batch_cache_dirty_tracker` to keep track of minimum and maximum dirty
offsets in an index that we need for the optimization.

Technically, we could achieve a similar performance optimization by
doing a binary search over the index using desired mark clean `up_to`
offset as a needle and iterate backwards through ranges marking them as
clean and stopping once we hit a clean range. However, the new data
structure has some additional benefits:

This data structure facilitates some invariants checks:
  - Dirty offsets are monotonic: `vassert(_max <= range.first && _max <
    range.second, ...)`
  - Truncation is requested only with a clean index:
    `vassert(_dirty_tracker.clean(), ...)`. See the next section for why
    this is important.
  - At the end of batch cache / batch cache index lifetime there are no
    dirty batches left in the index. This already caught 2 bugs in the
    existing code:
      1) redpanda-data#17032
      2) redpanda-data#17039

Also, it improves the debugging experience as we include the state of
the dirty tracker in the `batch_cache_index` ostream insertion operator.

---

Truncation needs special attention too. When truncation is requested we
must evict batches with offsets higher than truncation point but still
retain dirty batches with smaller offsets. Since we can't have holes in
the ranges and can evict only at the range granularity we require a
flush prior truncation. It is already present so no changes required.
nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Mar 15, 2024
Dirty entries are ones that aren't eligible for eviction as they might
have not yet been written durably to disk. Evicting them would otherwise
lead to "disappeared" data on cache miss.

We are making an implicit assumption that readers that want to read
beyond storage stable offset will always read through the batch cache.
Otherwise, the read-after-write semantics won't hold as the data
buffered in appender chunk cache isn't exposed.

---

Batch cache entries are `batch_cache::range` data structures. A range is
a 32KiB memory arena with some ancillary data structures. It is
associated with a `batch_cache_index`. `batch_cache_index` has 1-to-1
mapping with a segment. When a batch is put in the cache, the
`batch_cache_index` will append to an existing range or allocate a new
one and insert it into the `batch_cache`.

We propose to track dirty batches at `batch_cache::range` granularity
rather than per batch.

Using the insight that dirty batches are inserted with monotonically
increasing offsets, and the fact that we always mark as clean (clear the
dirty flag) for a prefix of offsets, it is enough to track only the
maximum dirty batch offset in a range and advance it when new dirty
batches are inserted. For marking offsets as clean (on flush) we check
if `clean_offset >= max_dirty_offset` and mark the range as clean. In
the alternative case, nothing needs to be done.

A `batch_cache_index` can be associated with multiple ranges which
potentially need to be marked as clean. As an optimization, we propose
to track which ranges might contain dirty data and when the request for
marking a prefix of offsets comes in to iterate only through this subset
of ranges. We will use this information together with the already
existing index which maps offsets to ranges
`absl::btree_map<model::offset, batch_cache::entry>` to iterate only
through the entries and ranges which contain the dirty offsets we are
interested in. The complexity of this iteration is O(N) where N is the
number of dirty batches that must be marked as clean.

To this end, we are introducing a new data structure called
`batch_cache_dirty_tracker` to keep track of minimum and maximum dirty
offsets in an index that we need for the optimization.

Technically, we could achieve a similar performance optimization by
doing a binary search over the index using desired mark clean `up_to`
offset as a needle and iterate backwards through ranges marking them as
clean and stopping once we hit a clean range. However, the new data
structure has some additional benefits:

This data structure facilitates some invariants checks:
  - Dirty offsets are monotonic: `vassert(_max <= range.first && _max <
    range.second, ...)`
  - Truncation is requested only with a clean index:
    `vassert(_dirty_tracker.clean(), ...)`. See the next section for why
    this is important.
  - At the end of batch cache / batch cache index lifetime there are no
    dirty batches left in the index. This already caught 2 bugs in the
    existing code:
      1) redpanda-data#17032
      2) redpanda-data#17039

Also, it improves the debugging experience as we include the state of
the dirty tracker in the `batch_cache_index` ostream insertion operator.

---

Truncation needs special attention too. When truncation is requested we
must evict batches with offsets higher than truncation point but still
retain dirty batches with smaller offsets. Since we can't have holes in
the ranges and can evict only at the range granularity we require a
flush prior truncation. It is already present so no changes required.
nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Mar 15, 2024
Dirty entries are ones that aren't eligible for eviction as they might
have not yet been written durably to disk. Evicting them would otherwise
lead to "disappeared" data on cache miss.

We are making an implicit assumption that readers that want to read
beyond storage stable offset will always read through the batch cache.
Otherwise, the read-after-write semantics won't hold as the data
buffered in appender chunk cache isn't exposed.

---

Batch cache entries are `batch_cache::range` data structures. A range is
a 32KiB memory arena with some ancillary data structures. It is
associated with a `batch_cache_index`. `batch_cache_index` has 1-to-1
mapping with a segment. When a batch is put in the cache, the
`batch_cache_index` will append to an existing range or allocate a new
one and insert it into the `batch_cache`.

We propose to track dirty batches at `batch_cache::range` granularity
rather than per batch.

Using the insight that dirty batches are inserted with monotonically
increasing offsets, and the fact that we always mark as clean (clear the
dirty flag) for a prefix of offsets, it is enough to track only the
maximum dirty batch offset in a range and advance it when new dirty
batches are inserted. For marking offsets as clean (on flush) we check
if `clean_offset >= max_dirty_offset` and mark the range as clean. In
the alternative case, nothing needs to be done.

A `batch_cache_index` can be associated with multiple ranges which
potentially need to be marked as clean. As an optimization, we propose
to track which ranges might contain dirty data and when the request for
marking a prefix of offsets comes in to iterate only through this subset
of ranges. We will use this information together with the already
existing index which maps offsets to ranges
`absl::btree_map<model::offset, batch_cache::entry>` to iterate only
through the entries and ranges which contain the dirty offsets we are
interested in. The complexity of this iteration is O(N) where N is the
number of dirty batches that must be marked as clean.

To this end, we are introducing a new data structure called
`batch_cache_dirty_tracker` to keep track of minimum and maximum dirty
offsets in an index that we need for the optimization.

Technically, we could achieve a similar performance optimization by
doing a binary search over the index using desired mark clean `up_to`
offset as a needle and iterate backwards through ranges marking them as
clean and stopping once we hit a clean range. However, the new data
structure has some additional benefits:

This data structure facilitates some invariants checks:
  - Dirty offsets are monotonic: `vassert(_max <= range.first && _max <
    range.second, ...)`
  - Truncation is requested only with a clean index:
    `vassert(_dirty_tracker.clean(), ...)`. See the next section for why
    this is important.
  - At the end of batch cache / batch cache index lifetime there are no
    dirty batches left in the index. This already caught 2 bugs in the
    existing code:
      1) redpanda-data#17032
      2) redpanda-data#17039

Also, it improves the debugging experience as we include the state of
the dirty tracker in the `batch_cache_index` ostream insertion operator.

---

Truncation needs special attention too. When truncation is requested we
must evict batches with offsets higher than truncation point but still
retain dirty batches with smaller offsets. Since we can't have holes in
the ranges and can evict only at the range granularity we require a
flush prior truncation. It is already present so no changes required.
nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Mar 15, 2024
Dirty entries are ones that aren't eligible for eviction as they might
have not yet been written durably to disk. Evicting them would otherwise
lead to "disappeared" data on cache miss.

We are making an implicit assumption that readers that want to read
beyond storage stable offset will always read through the batch cache.
Otherwise, the read-after-write semantics won't hold as the data
buffered in appender chunk cache isn't exposed.

---

Batch cache entries are `batch_cache::range` data structures. A range is
a 32KiB memory arena with some ancillary data structures. It is
associated with a `batch_cache_index`. `batch_cache_index` has 1-to-1
mapping with a segment. When a batch is put in the cache, the
`batch_cache_index` will append to an existing range or allocate a new
one and insert it into the `batch_cache`.

We propose to track dirty batches at `batch_cache::range` granularity
rather than per batch.

Using the insight that dirty batches are inserted with monotonically
increasing offsets, and the fact that we always mark as clean (clear the
dirty flag) for a prefix of offsets, it is enough to track only the
maximum dirty batch offset in a range and advance it when new dirty
batches are inserted. For marking offsets as clean (on flush) we check
if `clean_offset >= max_dirty_offset` and mark the range as clean. In
the alternative case, nothing needs to be done.

A `batch_cache_index` can be associated with multiple ranges which
potentially need to be marked as clean. As an optimization, we propose
to track which ranges might contain dirty data and when the request for
marking a prefix of offsets comes in to iterate only through this subset
of ranges. We will use this information together with the already
existing index which maps offsets to ranges
`absl::btree_map<model::offset, batch_cache::entry>` to iterate only
through the entries and ranges which contain the dirty offsets we are
interested in. The complexity of this iteration is O(N) where N is the
number of dirty batches that must be marked as clean.

To this end, we are introducing a new data structure called
`batch_cache_dirty_tracker` to keep track of minimum and maximum dirty
offsets in an index that we need for the optimization.

Technically, we could achieve a similar performance optimization by
doing a binary search over the index using desired mark clean `up_to`
offset as a needle and iterate backwards through ranges marking them as
clean and stopping once we hit a clean range. However, the new data
structure has some additional benefits:

This data structure facilitates some invariants checks:
  - Dirty offsets are monotonic: `vassert(_max <= range.first && _max <
    range.second, ...)`
  - Truncation is requested only with a clean index:
    `vassert(_dirty_tracker.clean(), ...)`. See the next section for why
    this is important.
  - At the end of batch cache / batch cache index lifetime there are no
    dirty batches left in the index. This already caught 2 bugs in the
    existing code:
      1) redpanda-data#17032
      2) redpanda-data#17039

Also, it improves the debugging experience as we include the state of
the dirty tracker in the `batch_cache_index` ostream insertion operator.

---

Truncation needs special attention too. When truncation is requested we
must evict batches with offsets higher than truncation point but still
retain dirty batches with smaller offsets. Since we can't have holes in
the ranges and can evict only at the range granularity we require a
flush prior truncation. It is already present so no changes required.
nvartolomei added a commit to nvartolomei/redpanda that referenced this pull request Mar 15, 2024
Dirty entries are ones that aren't eligible for eviction as they might
have not yet been written durably to disk. Evicting them would otherwise
lead to "disappeared" data on cache miss.

We are making an implicit assumption that readers that want to read
beyond storage stable offset will always read through the batch cache.
Otherwise, the read-after-write semantics won't hold as the data
buffered in appender chunk cache isn't exposed.

---

Batch cache entries are `batch_cache::range` data structures. A range is
a 32KiB memory arena with some ancillary data structures. It is
associated with a `batch_cache_index`. `batch_cache_index` has 1-to-1
mapping with a segment. When a batch is put in the cache, the
`batch_cache_index` will append to an existing range or allocate a new
one and insert it into the `batch_cache`.

We propose to track dirty batches at `batch_cache::range` granularity
rather than per batch.

Using the insight that dirty batches are inserted with monotonically
increasing offsets, and the fact that we always mark as clean (clear the
dirty flag) for a prefix of offsets, it is enough to track only the
maximum dirty batch offset in a range and advance it when new dirty
batches are inserted. For marking offsets as clean (on flush) we check
if `clean_offset >= max_dirty_offset` and mark the range as clean. In
the alternative case, nothing needs to be done.

A `batch_cache_index` can be associated with multiple ranges which
potentially need to be marked as clean. As an optimization, we propose
to track which ranges might contain dirty data and when the request for
marking a prefix of offsets comes in to iterate only through this subset
of ranges. We will use this information together with the already
existing index which maps offsets to ranges
`absl::btree_map<model::offset, batch_cache::entry>` to iterate only
through the entries and ranges which contain the dirty offsets we are
interested in. The complexity of this iteration is O(N) where N is the
number of dirty batches that must be marked as clean.

To this end, we are introducing a new data structure called
`batch_cache_dirty_tracker` to keep track of minimum and maximum dirty
offsets in an index that we need for the optimization.

Technically, we could achieve a similar performance optimization by
doing a binary search over the index using desired mark clean `up_to`
offset as a needle and iterate backwards through ranges marking them as
clean and stopping once we hit a clean range. However, the new data
structure has some additional benefits:

This data structure facilitates some invariants checks:
  - Dirty offsets are monotonic: `vassert(_max <= range.first && _max <
    range.second, ...)`
  - Truncation is requested only with a clean index:
    `vassert(_dirty_tracker.clean(), ...)`. See the next section for why
    this is important.
  - At the end of batch cache / batch cache index lifetime there are no
    dirty batches left in the index. This already caught 2 bugs in the
    existing code:
      1) redpanda-data#17032
      2) redpanda-data#17039

Also, it improves the debugging experience as we include the state of
the dirty tracker in the `batch_cache_index` ostream insertion operator.

---

Truncation needs special attention too. When truncation is requested we
must evict batches with offsets higher than truncation point but still
retain dirty batches with smaller offsets. Since we can't have holes in
the ranges and can evict only at the range granularity we require a
flush prior truncation. It is already present so no changes required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants