Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: support for collection types (LIST, SET, MAP) and UDT #9

Open
hartmut-co-uk opened this issue Jun 22, 2021 · 14 comments
Labels
enhancement New feature or request

Comments

@hartmut-co-uk
Copy link

As a consumer of my CDC event stream (Kafka topic), with table cdc enabled and collection types (LIST, SET, MAP) and UDT used, I'd like to receive change data of all columns of the *_cdc_log record, incl. collection type + UDT fields.

This would allow me to utilise the change event for stream processing as no data is omitted.

Example use cases:

  • any consumer for a table cdc event where collection type / UDT cols have changed
@brbrown25
Copy link

I’d be interested in assisting with this if no one else is.

@avelanarius
Copy link
Member

avelanarius commented Sep 9, 2021

The main difficulty in supporting collection types is supporting non-frozen types. In Scylla there are two types of collections/UDTs: frozen and non-frozen. When you update a frozen collection, its entire contents after the update are stored in the CDC log. On the other hand, you can partially update non-frozen collections (such as appending items to a list). In the CDC log, only the added/removed elements would be saved in such a case.

We (cc: @haaawk) have decided to not overcomplicate the generated Kafka message to accommodate those different operations in case of non-frozen collections (appending, removing, overwriting), especially since this is not what the Debezium model expects and most Sink Connectors would not support it. However, if we implemented support for postimages (#8 which we plan to do), a state of non-frozen collection/UDT after an update would be known (at the additional requirement that you have to enable postimages on your CDC table) - that way adding support for non-frozen collection types.

(You can read https://docs.scylladb.com/using-scylla/cdc/cdc-advanced-types/ for more info)

In the meantime, I have pushed (a very early) implementation of support of frozen collections: #12. To support post-images, we plan to implement a higher-level abstraction in scylla-cdc-java repo, that combines pre-images, delta and post-image rows and parses delta information of non-frozen collection updates.

@hartmut-co-uk hartmut-co-uk changed the title feature request: support for collection types (LIST, SET, MAP) and UDT [FEEDBACK REQUIRED] feature request: support for collection types (LIST, SET, MAP) and UDT Sep 16, 2021
@hartmut-co-uk hartmut-co-uk changed the title [FEEDBACK REQUIRED] feature request: support for collection types (LIST, SET, MAP) and UDT feature request: support for collection types (LIST, SET, MAP) and UDT Sep 16, 2021
@hartmut-co-uk
Copy link
Author

hartmut-co-uk commented Sep 16, 2021

(apologies for issue title rename, wrong browser tab -> please ignore)

@hartmut-co-uk
Copy link
Author

Hi @avelanarius is there an ETA for post-image support?
Alternatively could the support of frozen collections #12 be completed and merged any time soon?

@hartmut-co-uk
Copy link
Author

To support post-images, we plan to implement a higher-level abstraction in scylla-cdc-java repo, that combines pre-images, delta and post-image rows and parses delta information of non-frozen collection updates.

@avelanarius is this already in the making, are you also looking for contributors?
Are there any dependencies on an upcoming Scylla release? (4.6+/5.0)

@jain-vandit
Copy link

@avelanarius @hartmut-co-uk
can we merge #12 to have support for collection type / UDT? Is there something blocking us to go ahead with this?

@hartmut-co-uk
Copy link
Author

I have done more code changes on my fork last week to accommodate using UDT with Avro, but haven't had time to test them yet.
I'll try to make time this week to progress this further.

@haaawk
Copy link

haaawk commented Jan 18, 2022

@avelanarius and @Lorak-mmk are working on support for frozen and non-frozen collection

@jain-vandit
Copy link

hi @hartmut-co-uk @avelanarius
can I create a fork out of #12 and use it? Did you do any testing for this or shall I do it?

@Lorak-mmk
Copy link
Contributor

If I remember correctly #12 contains a performance problem - if you want to use non-merged version, then #21 should be better. It is based on #12 , supports non-frozen collections too, and doesn't have the performance problem I mentioned.

@mykaul mykaul added the enhancement New feature or request label Jan 16, 2023
@hansh0801
Copy link

track

@alonomri
Copy link

Hi, is there a plan to merge #21? we can really use this feature.

@arceushui
Copy link

+1. This is an important feature

@BruAPAHE
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests