New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: DML Design #17
RFC: DML Design #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if we do not support primary key def on table, we should forbidden this behaviour (?) and report to user (Currently seems we do not do that).
|
||
#### Materialize check | ||
|
||
The consistency and constraints check will be done in materialize executor. For every operation from upstream, the materialize executor will lookup its pk from storage and fix the changes if it is conflict with the storage's data. And because of the look up queries, a cache is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to design the cache?
If For every operation (INSERT\UPDATE\DELETE), the cost seems huge and hard to mitigate... Also I think it looks similar to Concurrent DELETE doc proposed by @fuyufjh a long time ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is that doc mentions. The check is to ensure the external data is correct. And we should do the check to ensure the data is ok and do not lead to panic of other component in our system
rfcs/0017-DML.md
Outdated
|
||
#### DML manager | ||
|
||
The `TableSourceManager` will rename as `DMLManager`. It will be indexed as `table_id` instead of `source_id` of the table source. The DML batch executors can get the `TableSource` for DML operation and the `Mutex<RowIdGenerator>` easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDL on MV will not go through SourceManager? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
Co-authored-by: Bowen <36908971+BowenXiao1999@users.noreply.github.com>
This comment was marked as duplicate.
This comment was marked as duplicate.
LGTM. |
rfcs/0017-DML.md
Outdated
|
||
#### Generate row_id in BatchInsert | ||
|
||
When user do an insert on a table without PK, we should help to generate the row_id for each records. Now we can only generate row_id in the SourceExecutor. But now, we might need to generate row id for both external data and DML insert's data. So the `BatchInsert` executor and `SourceExecutor` will share the `RowIdGenerator`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the
BatchInsert
executor andSourceExecutor
will share theRowIdGenerator
Please allow me to clarify that I don't have any insistence on this part. Our current approach is to let the BatchInsert
insert rows with row_id NULL
, which is acceptable to me. I just hope to keep it simple as possible, so let's see which approach is better when writing the code. 😋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me change my mind that we can not ensure we can reorder the DML messages in the future. Treating Insert
and other DMLs differently and separating them into different two channels will break any happens-before relation between them. I am not sure we should support the strict transaction now because our other component(such as frontend and batch) has not supported it. But I'd like to leave the possibility with a little price.
Co-authored-by: Eric Fu <fuyufjh@gmail.com>
Besides, We can move
|
By the way, in the future we may need to optionally support persisting users' writes somewhere, to ensure no |
This comment was marked as duplicate.
This comment was marked as duplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first query description in the figure should be "no user defined pk". Rest LGTM.
Preview: https://github.com/risingwavelabs/rfcs/blob/sts/DML/rfcs/0017-DML.md