Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC: collection updates which are split have incorrect pre/postimages #6597

Closed
piodul opened this issue Jun 5, 2020 · 0 comments · Fixed by #6741
Closed

CDC: collection updates which are split have incorrect pre/postimages #6597

piodul opened this issue Jun 5, 2020 · 0 comments · Fixed by #6741
Assignees
Milestone

Comments

@piodul
Copy link
Contributor

piodul commented Jun 5, 2020

Consider the following example:

cqlsh:ks> create table tbl (pk int PRIMARY KEY, c list<int>) with cdc = {'enabled': true, 'preimage': true, 'postimage': true};
cqlsh:ks> begin unlogged batch update tbl using ttl 10000 set c = c + [1] where pk = 0; update tbl using ttl 20000 set c = c + [2] where pk = 0; apply batch;
cqlsh:ks> select * from tbl_scylla_cdc_log;

 cdc$stream_id                      | cdc$time                             | cdc$batch_seq_no | c                                         | cdc$deleted_c | cdc$deleted_elements_c | cdc$operation | cdc$ttl | pk
------------------------------------+--------------------------------------+------------------+-------------------------------------------+---------------+------------------------+---------------+---------+----
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                0 | {d042a360-a728-11ea-a8e8-000000000000: 1} |          null |                   null |             1 |   10000 |  0
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                1 | {d042a360-a728-11ea-a8e8-000000000000: 1} |          null |                   null |             9 |    null |  0
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                2 | {d042a361-a728-11ea-a8e8-000000000000: 2} |          null |                   null |             1 |   20000 |  0
 0xcf452e9350d6490d58c082071c5c3082 | d0428b0a-a728-11ea-5760-719318441668 |                3 | {d042a361-a728-11ea-a8e8-000000000000: 2} |          null |                   null |             9 |    null |  0

(4 rows)
cqlsh:ks> select * from tbl;

 pk | c
----+--------
  0 | [1, 2]

(1 rows)

The batch contains two UPDATEs, each of them appending a cell to the same row. Appended cells have different TTLs, therefore we create separate log entries for them.

The first two rows of the tbl_scylla_cdc_log correspond to cell 1 with TTL 10000. The table was empty before the first update, so there is no preimage. There is postimage, with correct value.

However, information about cell 2 with TTL 20000 is wrong - there is no preimage, and postimage is wrong (it contains only cell 2, but not 1).

@slivne slivne added this to the 4.2 milestone Jun 16, 2020
nyh added a commit that referenced this issue Jul 9, 2020
Merged pull request #6741
by Piotr Dulikowski:

This PR changes the algorithm used to generate preimages and postimages
in CDC log. While its behavior is the same for non-batch operations
(with one exception described later), it generates pre/postimages that
are organized more nicely, and account for multiple updates to the same
row in one CQL batch.

Fixes #6597, #6598

Tests:
- unit(dev), for each consecutive commit
- unit(debug), for the last commit

Previous method

The previous method worked on a per delta row basis. First, the base
table is queried for the current state of the rows being modified in
the processed mutation (this is called the "preimage query"). Then,
for each delta row (representing a modification of a row):

    If preimage is enabled and the row was already present in the table,
    a corresponding preimage row is inserted before the delta row.
    The preimage row contains data taken directly from the preimage
    query result. Only columns that are modified by the delta are
    included in the preimage.
    If postimage is enabled, then a postimage row is inserted after the
    delta row. The postimage row contains data which was a result of
    taking row data directly from the preimage query result and applying
    the change the corresponding delta row represented. All columns
    of the row are included in the postimage.

The above works well for simple cases such like singular CQL INSERT,
UPDATE, DELETE, or simple CQL BATCH-es. An example:

cqlsh:ks> BEGIN UNLOGGED BATCH
			INSERT INTO tbl (pk, ck, v) VALUES (0, 1, 111);
			INSERT INTO tbl (pk, ck, v) VALUES (0, 2, 222);
			APPLY BATCH;
cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl",
			pk, ck, v from ks.tbl_scylla_cdc_log ;

 cdc$batch_seq_no | cdc$operation | cdc$ttl | pk | ck | v
------------------+---------------+---------+----+----+-----
...snip...
                0 |             0 |    null |  0 |  1 | 100
                1 |             2 |    null |  0 |  1 | 111
                2 |             9 |    null |  0 |  1 | 111
                3 |             0 |    null |  0 |  2 | 200
                4 |             2 |    null |  0 |  2 | 222
                5 |             9 |    null |  0 |  2 | 222

Preimage rows are represented by cdc operation 0, and postimage by 9.
Please note that all rows presented above share the same value of
cdc$time column, which was not shown here for brevity.

Problems with previous approach

This simple algorithm has some conceptual and implementational problems
which arise when processing more complicated CQL BATCH-es. Consider
the following example:

cqlsh:ks> BEGIN UNLOGGED BATCH
			INSERT INTO tbl (pk, ck, v1) VALUES (0, 0, 1) USING TTL 1000;
			INSERT INTO tbl (pk, ck, v2) VALUES (0, 0, 2) USING TTL 2000;
			APPLY BATCH;
cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl",
			pk, ck, v1, v2 FROM tbl_scylla_cdc_log;

 cdc$batch_seq_no | cdc$operation | cdc$ttl | pk | ck | v1   | v2
------------------+---------------+---------+----+----+------+------
...snip...
                0 |             0 |    null |  0 |  0 | null |    0
                1 |             2 |    2000 |  0 |  0 | null |    2
                2 |             9 |    null |  0 |  0 |    0 |    2
                3 |             0 |    null |  0 |  0 |    0 | null
                4 |             1 |    1000 |  0 |  0 |    1 | null
                5 |             9 |    null |  0 |  0 |    1 |    0

A single cdc group (corresponding to rows sharing the same cdc$time)
might have more than one delta that modify the same row. For example,
this happens when modifying two columns of the same row with
different TTLs - due to our choice of CDC log schema, we must
represent such change with two delta rows.

It does not make sense to present a postimage after the first delta
and preimage before the second - both deltas are applied
simultaneously by the same CQL BATCH, so the middle "image" is purely
imaginary and does not appear at any point in the table.

Moreover, in this example, the last postimage is wrong - v1 is updated,
but v2 is not. None of the postimages presented above represent the
final state of the row.

New algorithm

The new algorithm works now on per cdc group basis, not delta row.
When starting processing a CQL BATCH:

    Load preimage query results into a data structure representing
    current state of the affected rows.

For each cdc group:

    For each row modified within the group, a preimage is produced,
    regardless if the row was present in the table. The preimage
    is calculated based on the current state. Only include columns
    that are modified for this row within the group.
    For each delta, produce a delta row and update the current state
    accordingly.
    Produce postimages in the same way as preimages - but include all
    columns for each row in the postimage.

The new algorithm produces postimage correctly when multiple deltas
affect one, because the state of the row is updated on the fly.

This algorithm moves preimage and postimage rows to the beginning and
the end of the cdc group, accordingly. This solves the problem of
imaginary preimages and postimages appearing inside a cdc group.

Unfortunately, it is possible for one CQL BATCH to contain changes that
use multiple timestamps. This will result in one CQL BATCH creating
multiple cdc groups, with different cdc$time. As it is impossible, with
our choice of schema, to tell that those cdc groups were created from
one CQL BATCH, instead we pretend as if those groups were separate CQL
operations. By tracking the state of the affected rows, we make sure
that preimage in later groups will reflect changes introduces in
previous groups.

One more thing - this algorithm should have the same results for
singular CQL operations and simple CQL BATCH-es, with one exception.
Previously, preimage not produced if a row was not present in the
table. Now, the preimage row will appear unconditionally - it will have
nulls in place of column values.

* 'cdc-pre-postimage-persistence' of github.com:piodul/scylla:
  cdc: fix indentation
  cdc: don't update partition state when not needed
  cdc: implement pre/postimage persistence
  cdc: add interface for producing pre/postimages
  cdc: load preimage query result into partition state fields
  cdc: introduce fields for keeping partition state
  cdc: rename set_pk_columns -> allocate_new_log_row
  cdc: track batch_no inside transformer
  cdc: move cdc$time generation to transformer
  cdc: move find_timestamp to split.cc
  cdc: introduce change_processor interface
  cdc: remove redundant schema arguments from cdc functions
  cdc: move management of generated mutations inside transformer
  cdc: move preimage result set into a field of transformer
  cdc: keep ts and tuuid inside transformer
  cdc: track touched parts of mutations inside transformer
  cdc: always include preimage for affected rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants