Nested Record Processing #41
snork-alt
started this conversation in
Feature Requests
Replies: 2 comments
-
@v3g42 @duonganhthu43 I felt using the WHERE condition between child and parent was getting very complex to implement. See what you think of this, which is easier to implement. Later we could also consider implementing the WHERE |
Beta Was this translation helpful? Give feedback.
0 replies
-
@snork-alt We still will filter by the foreign key value I am guessing ? Would this syntax be more appropriate ?
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Let's consider the following example source tables:
users
users_contact
We want to populate the cache with an object containing the following information:
username, fn, ln, country
fromusers
tabletype, contact
fromusers_contact
tableThe final object will look like this:
Query definition and schema registration
In order to obtain the above object, the user should define a new endpoint specifying the following query:
Two new SQL keywords are defined by DOZER SQL:
NESTED_ARR
: Allows the user to specify a subquery that will be represented as an array of records in the cacheINDEX BY
: Allows the user to specify what field is used for indexing in the cache. In the example above the parent object usesid
as a primary key. For the nested query, however, we are using two fields:user_id
andid
.user_id
is used for creating the relationship between the nested object and the parent object. In the source tablesuser_id
is a foreign key creating in a relationship between theuser_contacts
table and theusers
table.id
represents the primary key of theusers_contact
tableUpon definition of the query Dozer will register a new schema that will look like the following, assigning a new unique id to it
Pipeline
The pipeline will take care of processing any CDC event produced by the two tables above, processing the query and generating
Operation
objects following the schema above. Let's consider an example:New record added in
user_contacts
In the following example, a new email contact is being added for the user john123. Here is the Operation message produced at the end of the pipeline, to be consumed by the cache writer.
In the above examples, only the fields from the
users_contact
table are propagated. The only field propagated from the parent table is theid
, which is needed for the correct indexing in the cache. All the field values are positional and matching to the schema previously registered. Theindex
field in the schema definition defines the offset of each fieldCache population
Each nested record in the cache is mapped to a specific key/value entry. For the object provided above the following is the physical cache representation
/u1
{username: "john123", fn: "John", ln: "Smith", country: "SG"}
/u1/4:c1
{type: "phone", contact: "+65 776536 "}
/u1/4:c2
{type: "email", contact: " john123@gmail.com"}
/u1/4:c4
{type: "email", contact: "john1234@gmail.com"}
For the parent object, the key is simply represented by the value of the field indicated in the
INDEX BY
keyword in parent query. For the inner object, the key is represented by the values of the two fields indicated in the 'INDEX BY' keyword in in the nested query. Note that, for every inner object, the value is prefixed with the index of the field it is referring to (4
in this schema, as it it shown in the schema definition)Beta Was this translation helpful? Give feedback.
All reactions