Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge "Introduce flat_mutation_reader_v2" from Tomasz
" This series introduces a new version of the mutation fragment stream (called v2) which aims at improving range tombstone handling in the system. When compacting a mutation fragment stream (e.g. for sstable compaction, data query, repair), the compactor needs to accumulate range tombstones which are relevant for the yet-to-be-processed range. See range_tombstone_accumulator. One problem is that it has unbounded memory footprint because the accumulator needs to keep track of all the tombstoned ranges which are still active. Another, although more benign, problem is computational complexity needed to maintain that data structure. The fix is to get rid of the overlap of range tombstones in the mutation fragment stream. In v2 of the stream, there is no longer a range_tombstone fragment. Deletions of ranges of rows within a given partition are represented with range_tombstone_change fragments. At any point in the stream there is a single active clustered tombstone. It is initially equal to the neutral tombstone when the stream of each partition starts. The range_tombstone_change fragment type signify changes of the active clustered tombstone. All fragments emitted while a given clustered tombstone is active are affected by that tombstone. Like with the old range_tombstone fragments, the clustered tombstone is independent from the partition tombstone carried in partition_start. The memory needed to compact a stream is now constant, because the compactor needs to only track the current tombstone. Also, there is no need to expire ranges on each fragment because the stream emits a fragment when the range ends. This series doesn't convert all readers to v2. It introduces adaptors which can convert between v1 and v2 streams. Each mutation source can be constructed with either v1 or v2 stream factory, but it can be asked any version, performing conversion under the hood if necessary. In order to guarantee that v1 to v2 conversion produces a well-formed stream, this series needs to impose a constraint on v1 streams to trim range tombstones to clustering restrictions. Otherwise, v1->v2 converted could produce range tombstone changes which lie outside query restrictions, making the stream non-canonical. The v2 stream is strict about range tombstone trimming. It emits range tombstone changes which reflect range tombstones trimmed to query restrictions, and fast-forwarding ranges. This makes the stream more canonical, meaning that for a given set of writes, querying the database should produce the same stream of fragments for a given restrictions. There is less ambiguity in how the writes are represented in the fragment stream. It wasn't the case with v1. For example, A given set of deletions could be produced either as one range_tombstone, or may, split and/or deoverlapped with other fragments. Making a stream canonical is easier for diff-calculating. The mc sstable reader was converted to v2 because it seemed like a comparable effort to do that versus implementing range tombstone trimming in v1. The classes related to mutation fragment streams were cloned: flat_mutation_reader_v2, mutation_fragment_v2, related concepts. Refs #8625. To fully fix #8625 we need to finish the transition and get rid of the converters. Converters accumulate range tombstones. Tests: - unit [dev] " * tag 'flat_mutation_reader_range_tombstone_split-v3.2' of github.com:tgrabiec/scylla: (26 commits) tests: mutation_source_test: Run tests with conversions inserted in the middle tests: mutation_source_tests: Unroll run_flat_mutation_reader_tests() tests: Add tests for flat_mutation_reader_v2 flat_mutation_reader: Update the doc to reflect range tombstone trimming sstables: Switch the mx reader to flat_mutation_reader_v2 row_cache: Emit range tombstone adjacent to upper bound of population range tests: sstables: Fix test assertions to not expect more than they should flat_mutation_reader: Trim range tombstones in make_flat_mutation_reader_from_fragments() clustering_ranges_walker: Emit range tombstone changes while walking tests: flat_mutation_reader_assertions_v2: Adapt to the v2 stream Clone flat_reader_assertions into flat_reader_assertions_v2 test: lib: simple_schema: Reuse new_tombstone() test: lib: simple_schema: Accept tombstone in delete_range() mutation_source: Introduce make_reader_v2() partition_snapshot_flat_reader: Trim range tombstones to query ranges mutation_partition: Trim range tombstones to query ranges sstables: reader: Inline specialization of sstable_mutation_reader sstables: k_l: reader: Trim range tombstones to query ranges clustering_ranges_walker: Introduce split_tombstone() position_range: Introduce contains() check for ranges ...
- Loading branch information