Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for collections #21

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Lorak-mmk
Copy link
Contributor

Based on #12 and includes changes from it - so review should be done on per-commit base.

This PR adds simple support for non-frozen collections.

There is new config option scylla.collections.mode, currently only possible value is "simple" - it selects format for non-frozen collections (in the future we could add preimage etc).

Simple mode collections format is described in README.md (along with frozen collections format). Just to give very brief description: non-frozen collections are represented as structs with 2 fields, "mode" and "elements", "mode" marks type of operation (add elements, remove elements, overwrite collection), "elements" are actual elements used in operation.
For Set, "elements" is simply a Set.
List is a map with timeuuid key type. When removing elements, values are null.
Map is simply a Map. When removing elements, values are null.
UDT is the most complicated. It is represented as struct, but each field is a Cell, and semantics are the same as with column's "Cell" - null means no change, non-null with null value field means removal, non-null with non-null value field means new value.

I didn't yet test it with Avro.

Piotr Grabowski and others added 6 commits September 9, 2021 18:35
Add support for including frozen lists in generated changes. Made
necessary changes to support nested data types.
Add support for including frozen sets in generated changes.
Add support for including frozen maps in generated changes.
Add support for including tuples in generated changes. For a tuple,
a Kafka Connect struct is created with "tuple_member_*" for each member
of a tuple (as they can have different data types inside).
Add support for including frozen UDTs in generated changes.
@Lorak-mmk Lorak-mmk changed the title Non frozen collections Support for non-frozen collections Feb 3, 2022
@Lorak-mmk
Copy link
Contributor Author

Lorak-mmk commented Feb 8, 2022

Now that I think of it, maybe it's redundant to have "REMOVE" mode and removals should be represented as setting to null (as is currently the case for UDT)? Then we would have 2 modes, let's say "UPDATE" and "OVERWRITE", the difference between them would be whether the collection is cleared before operation.
@avelanarius

@avelanarius
Copy link
Member

Now that I think of it, maybe it's redundant to have "REMOVE" mode and removals should be represented as setting to null (as is currently the case for UDT)? Then we would have 2 modes, let's say "UPDATE" and "OVERWRITE", the difference between them would be whether the collection is cleared before operation. @avelanarius

I don't see how it would work for sets?

@hartmut-co-uk
Copy link

Opinion:
Hi, for me it would be great if we could also have to option (configurable?) to just emit FROZEN collections 'as-is' (...always the full latest value).
=> so without the extra ELEMENTS_VALUE; REMOVED_ELEMENTS_VALUE; MODE_VALUE.

That would make the output record look cleaner and more like if you'd query Scylla directly.

@Lorak-mmk
Copy link
Contributor Author

I pushed new version, with a bit different representation.
It had to be changed, because previous one didn't work well with queries performing more than one modification on given collection, e.g.: UPDATE ks.t_list SET v = v - [6, 7], v = v + [4, 5] WHERE pk = 1;

Now, there are only 2 modes: OVERWRITE and MODIFY, and collection struct always has 2 fields: mode and elements.
For list/maps, elements is a map, element is added/overwritten if value is not null, removed otherwise.
For sets, elements is a map, with boolean value - true means value was added to set, false means it was removed.
UDTs didn't change.

I also renamed SIMPLE mode to DELTA, to better reflect what it actually is.

@avelanarius @haaawk

Opinion: Hi, for me it would be great if we could also have to option (configurable?) to just emit FROZEN collections 'as-is' (...always the full latest value). => so without the extra ELEMENTS_VALUE; REMOVED_ELEMENTS_VALUE; MODE_VALUE.

That would make the output record look cleaner and more like if you'd query Scylla directly.

Yes, that would of course be better, but is harder (as it requires preimage/postimage usage), and will be added in the future - that's why I added config option to select mode for non-frozen collections.

@Lorak-mmk Lorak-mmk changed the title Support for non-frozen collections Support for collections Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants