-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: on conflict clause #48
Conversation
Co-authored-by: xxchan <xxchan22f@gmail.com>
Co-authored-by: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>
(1 row) | ||
``` | ||
|
||
The most strange part is that the input parameter and return value of the "reduce function" must have the same schema. But it can be easily workaround with constant expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about how schema change will handle this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mechanism to handle the outdated schema has no conflict with this RFC. 🤔 We have the same issue before we implement this RFC
COALESCE(EXCLUDED.b, b)); | ||
``` | ||
|
||
This feature will be useful when user use wide-table schema data model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this is similar to Apache Doris' pre-aggregate data model, where the data of the same PK can be automatically merged according to some custom rules. However, as one of its selling points, Doris has a specially designed storage to do this pre-aggregation efficiently, but we don't have it.
Apparently, our storage is optimized for non-conflict use cases, which means that most rows will not have a conflict on PK, and the PK check in MaterializeExecutor is just a precaution to avoid a crash. On the contrary, this RFC assumes that incoming rows will have many conflicts by design, and hopes to resolve them in MaterializeExecutor. I don't think it's a good idea to encourage users to do this, and it's not worth optimizing for.
Also, as the proposal of it, I am not very confident that checking in MaterializeExecutor is a good idea. Maybe it will become a painpoint at some point and maybe users will ask us to remove it. I don't really recommend relying on it much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as the proposal of it, I am not very confident that checking in MaterializeExecutor is a good idea. Maybe it will become a painpoint at some point and maybe users will ask us to remove it. I don't really recommend relying on it much.
We can just treat the check in the MaterializeExecutor as a pluggable feature. Now it is implemented as a flag in MaterializeExecutor, so if users ask us to remove it we can remove it easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this is similar to Apache Doris' pre-aggregate data model, where the data of the same PK can be automatically merged according to some custom rules. However, as one of its selling points, Doris has a specially designed storage to do this pre-aggregation efficiently, but we don't have it.
https://doris.apache.org/docs/data-table/data-model/#aggregate-model just supports some limited behaviors.
AggregationTypecurrently has the following four ways of aggregation:
SUM: Sum, multi-line Value accumulation.
REPLACE: Instead, Values in the next batch of data will replace Values in rows previously imported.
MAX: Keep the maximum.
MIN: Keep the minimum.
In fact, if we just want to support this behavior, RisingWave has already been able to express it with a Create Materialized view
with an aggregate query.
The on conflicts
can take much more expressiveness to users.
|
||
## Summary | ||
|
||
allow user to declare the conflict behavior when the newly inserted row break the unique constraint of primary key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, when do we do constraint checks? every time a row is materialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the logic will be implemented in the materialize executor of the table.
|
||
### Future possibilities | ||
|
||
We can support the `ON CONFLICT` clause in the Insert statement if we support attach the on conflict description for the chunk in `BatchInsert` executor and `StreamDML` executor to let `Materialize` executor know how to handle the conflict. But the on conflict clause is always needed in create table statement for the table with connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still "stream path" between the DML
and Materialize
, also Exchange
if there's user-specified PK, so eventually it seems we need to attach this "description" to each row, which sounds like an overkill. 🤔
|
||
```SQL | ||
CREATE table t(k int primary key, int cnt) | ||
ON CONFLICT DO UPDATE SET cnt = t.cnt + EXCLUDED.cnt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't use aggregation call here:
select k, sum(cnt) from t;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a trivial example here to show that this grammar can let user declare their own aggregation logic with SQL. And the loigic could be more complex.
Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>
[ WHERE condition ] | ||
``` | ||
|
||
- **Defined when `CREATE TABLE` instead of `INSERT`** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can consider this as a config for the data from connector. i.e. What command the connector should use to write data - REPLACE or INSERT or anything else.
details and description in the RFC is out-of-date and the behavior of the feature is written in risingwavelabs/risingwave-docs#1860 |
No description provided.