Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use prepared statements for replication #13

Open
kbr- opened this issue Dec 22, 2020 · 2 comments
Open

Always use prepared statements for replication #13

kbr- opened this issue Dec 22, 2020 · 2 comments
Labels
replicator Replicator example application

Comments

@kbr-
Copy link

kbr- commented Dec 22, 2020

// If the replicated table has a non-frozen collection, for example (list<int>):
//
// CREATE TABLE ks.t(pk int, ck int, v list<int>, PRIMARY KEY(pk, ck)) WITH cdc = {'enabled': true};
//
// there could be an operation like this:
//
// UPDATE ks.t SET v = v + [1, 2] WHERE pk = 0 AND ck = 0;
//
// that cannot be handled by a prepared statement: CDC log contains only the added elements,
// not the final result of the operation.
//
// Therefore, in such case do not use prepared statements.
boolean hasNonFrozenCollection = sourceTableMetadata.getColumns().stream().anyMatch(c -> c.getType().isCollection() && !c.getType().isFrozen());

We can always use prepared statements. Sometimes we may need to use a batch statement, but still.
Example prepared statement that can handle any non-frozen map update:

begin unlogged batch
update ks.t set v = v + :added, v = v - :removed where pk = :pk and ck = :ck;
update ks.t set v = :replaced where pk = :pk and ck = :ck;
apply batch

bind the markers as needed: :added from the CDC v column, :removed from cdc$deleted_elements_v, :replaced from cdc$deleted_v (bind null if it got deleted, leave unset otherwise).

@kbr-
Copy link
Author

kbr- commented Dec 22, 2020

Also until scylladb/scylladb#7825 is fixed, always bind :removed in the above example (can bind to empty set if no elements got removed)

@kbr-
Copy link
Author

kbr- commented Dec 22, 2020

In general, I think that a batch of 2 can solve every case:

for cdc$operation = 1 (update):

  • one update for adding and removing elements from non-frozen collections and setting non-frozen UDT fields; also ensure to use scylla_timeuuid_list_index for non-frozen lists, ALWAYS!
  • one update for everything else: setting other fields, and setting non-frozen collections to null if cdc$deleted_X is True

for cdc$operation = 2 (insert):

  • one update for setting non-frozen list elements using scylla_timeuuid_list_index
  • one insert for everything else: creating the row marker, replacing non-frozen collections (including lists: we then combine setting to null with the above update to obtain the new list), replacing all other types of fields

@avelanarius avelanarius added the replicator Replicator example application label Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
replicator Replicator example application
Projects
None yet
Development

No branches or pull requests

2 participants