CDC: collection updates which are split have incorrect pre/postimages #6597

piodul · 2020-06-05T12:55:36Z

Consider the following example:

cqlsh:ks> create table tbl (pk int PRIMARY KEY, c list<int>) with cdc = {'enabled': true, 'preimage': true, 'postimage': true};
cqlsh:ks> begin unlogged batch update tbl using ttl 10000 set c = c + [1] where pk = 0; update tbl using ttl 20000 set c = c + [2] where pk = 0; apply batch;
cqlsh:ks> select * from tbl_scylla_cdc_log;

 cdc$stream_id                      | cdc$time                             | cdc$batch_seq_no | c                                         | cdc$deleted_c | cdc$deleted_elements_c | cdc$operation | cdc$ttl | pk
------------------------------------+--------------------------------------+------------------+-------------------------------------------+---------------+------------------------+---------------+---------+----
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                0 | {d042a360-a728-11ea-a8e8-000000000000: 1} |          null |                   null |             1 |   10000 |  0
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                1 | {d042a360-a728-11ea-a8e8-000000000000: 1} |          null |                   null |             9 |    null |  0
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                2 | {d042a361-a728-11ea-a8e8-000000000000: 2} |          null |                   null |             1 |   20000 |  0
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                3 | {d042a361-a728-11ea-a8e8-000000000000: 2} |          null |                   null |             9 |    null |  0

(4 rows)
cqlsh:ks> select * from tbl;

 pk | c
----+--------
  0 | [1, 2]

(1 rows)

The batch contains two UPDATEs, each of them appending a cell to the same row. Appended cells have different TTLs, therefore we create separate log entries for them.

The first two rows of the tbl_scylla_cdc_log correspond to cell 1 with TTL 10000. The table was empty before the first update, so there is no preimage. There is postimage, with correct value.

However, information about cell 2 with TTL 20000 is wrong - there is no preimage, and postimage is wrong (it contains only cell 2, but not 1).

The text was updated successfully, but these errors were encountered:

Merged pull request #6741 by Piotr Dulikowski: This PR changes the algorithm used to generate preimages and postimages in CDC log. While its behavior is the same for non-batch operations (with one exception described later), it generates pre/postimages that are organized more nicely, and account for multiple updates to the same row in one CQL batch. Fixes #6597, #6598 Tests: - unit(dev), for each consecutive commit - unit(debug), for the last commit Previous method The previous method worked on a per delta row basis. First, the base table is queried for the current state of the rows being modified in the processed mutation (this is called the "preimage query"). Then, for each delta row (representing a modification of a row): If preimage is enabled and the row was already present in the table, a corresponding preimage row is inserted before the delta row. The preimage row contains data taken directly from the preimage query result. Only columns that are modified by the delta are included in the preimage. If postimage is enabled, then a postimage row is inserted after the delta row. The postimage row contains data which was a result of taking row data directly from the preimage query result and applying the change the corresponding delta row represented. All columns of the row are included in the postimage. The above works well for simple cases such like singular CQL INSERT, UPDATE, DELETE, or simple CQL BATCH-es. An example: cqlsh:ks> BEGIN UNLOGGED BATCH INSERT INTO tbl (pk, ck, v) VALUES (0, 1, 111); INSERT INTO tbl (pk, ck, v) VALUES (0, 2, 222); APPLY BATCH; cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl", pk, ck, v from ks.tbl_scylla_cdc_log ; cdc$batch_seq_no | cdc$operation | cdc$ttl | pk | ck | v ------------------+---------------+---------+----+----+----- ...snip... 0 | 0 | null | 0 | 1 | 100 1 | 2 | null | 0 | 1 | 111 2 | 9 | null | 0 | 1 | 111 3 | 0 | null | 0 | 2 | 200 4 | 2 | null | 0 | 2 | 222 5 | 9 | null | 0 | 2 | 222 Preimage rows are represented by cdc operation 0, and postimage by 9. Please note that all rows presented above share the same value of cdc$time column, which was not shown here for brevity. Problems with previous approach This simple algorithm has some conceptual and implementational problems which arise when processing more complicated CQL BATCH-es. Consider the following example: cqlsh:ks> BEGIN UNLOGGED BATCH INSERT INTO tbl (pk, ck, v1) VALUES (0, 0, 1) USING TTL 1000; INSERT INTO tbl (pk, ck, v2) VALUES (0, 0, 2) USING TTL 2000; APPLY BATCH; cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl", pk, ck, v1, v2 FROM tbl_scylla_cdc_log; cdc$batch_seq_no | cdc$operation | cdc$ttl | pk | ck | v1 | v2 ------------------+---------------+---------+----+----+------+------ ...snip... 0 | 0 | null | 0 | 0 | null | 0 1 | 2 | 2000 | 0 | 0 | null | 2 2 | 9 | null | 0 | 0 | 0 | 2 3 | 0 | null | 0 | 0 | 0 | null 4 | 1 | 1000 | 0 | 0 | 1 | null 5 | 9 | null | 0 | 0 | 1 | 0 A single cdc group (corresponding to rows sharing the same cdc$time) might have more than one delta that modify the same row. For example, this happens when modifying two columns of the same row with different TTLs - due to our choice of CDC log schema, we must represent such change with two delta rows. It does not make sense to present a postimage after the first delta and preimage before the second - both deltas are applied simultaneously by the same CQL BATCH, so the middle "image" is purely imaginary and does not appear at any point in the table. Moreover, in this example, the last postimage is wrong - v1 is updated, but v2 is not. None of the postimages presented above represent the final state of the row. New algorithm The new algorithm works now on per cdc group basis, not delta row. When starting processing a CQL BATCH: Load preimage query results into a data structure representing current state of the affected rows. For each cdc group: For each row modified within the group, a preimage is produced, regardless if the row was present in the table. The preimage is calculated based on the current state. Only include columns that are modified for this row within the group. For each delta, produce a delta row and update the current state accordingly. Produce postimages in the same way as preimages - but include all columns for each row in the postimage. The new algorithm produces postimage correctly when multiple deltas affect one, because the state of the row is updated on the fly. This algorithm moves preimage and postimage rows to the beginning and the end of the cdc group, accordingly. This solves the problem of imaginary preimages and postimages appearing inside a cdc group. Unfortunately, it is possible for one CQL BATCH to contain changes that use multiple timestamps. This will result in one CQL BATCH creating multiple cdc groups, with different cdc$time. As it is impossible, with our choice of schema, to tell that those cdc groups were created from one CQL BATCH, instead we pretend as if those groups were separate CQL operations. By tracking the state of the affected rows, we make sure that preimage in later groups will reflect changes introduces in previous groups. One more thing - this algorithm should have the same results for singular CQL operations and simple CQL BATCH-es, with one exception. Previously, preimage not produced if a row was not present in the table. Now, the preimage row will appear unconditionally - it will have nulls in place of column values. * 'cdc-pre-postimage-persistence' of github.com:piodul/scylla: cdc: fix indentation cdc: don't update partition state when not needed cdc: implement pre/postimage persistence cdc: add interface for producing pre/postimages cdc: load preimage query result into partition state fields cdc: introduce fields for keeping partition state cdc: rename set_pk_columns -> allocate_new_log_row cdc: track batch_no inside transformer cdc: move cdc$time generation to transformer cdc: move find_timestamp to split.cc cdc: introduce change_processor interface cdc: remove redundant schema arguments from cdc functions cdc: move management of generated mutations inside transformer cdc: move preimage result set into a field of transformer cdc: keep ts and tuuid inside transformer cdc: track touched parts of mutations inside transformer cdc: always include preimage for affected rows

piodul added bug area/cdc labels Jun 5, 2020

haaawk assigned piodul Jun 15, 2020

slivne added this to the 4.2 milestone Jun 16, 2020

piodul mentioned this issue Jul 1, 2020

cdc: better pre/postimages for complicated batches #6741

Merged

scylladb-promoter closed this as completed in #6741 Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDC: collection updates which are split have incorrect pre/postimages #6597

CDC: collection updates which are split have incorrect pre/postimages #6597

piodul commented Jun 5, 2020

CDC: collection updates which are split have incorrect pre/postimages #6597

CDC: collection updates which are split have incorrect pre/postimages #6597

Comments

piodul commented Jun 5, 2020