RFC: on conflict clause #48

st1page · 2023-02-07T12:40:57Z

No description provided.

rfcs/0048-on-conflict-clause-in-create-table-statement.md

Co-authored-by: xxchan <xxchan22f@gmail.com>

rfcs/0048-on-conflict-clause-in-create-table-statement.md

Co-authored-by: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>

wcy-fdu · 2023-02-08T05:38:02Z

rfcs/0048-on-conflict-clause-in-create-table-statement.md

+    (1 row)
+  ```
+
+  The most strange part is that the input parameter and return value of the "reduce function" must have the same schema. But it can be easily workaround with constant expressions.


Just curious about how schema change will handle this case.

I think the mechanism to handle the outdated schema has no conflict with this RFC. 🤔 We have the same issue before we implement this RFC

fuyufjh · 2023-02-09T09:54:33Z

rfcs/0048-on-conflict-clause-in-create-table-statement.md

+      COALESCE(EXCLUDED.b, b));
+  ```
+
+  This feature will be useful when user use wide-table schema data model.


As I understand it, this is similar to Apache Doris' pre-aggregate data model, where the data of the same PK can be automatically merged according to some custom rules. However, as one of its selling points, Doris has a specially designed storage to do this pre-aggregation efficiently, but we don't have it.

Apparently, our storage is optimized for non-conflict use cases, which means that most rows will not have a conflict on PK, and the PK check in MaterializeExecutor is just a precaution to avoid a crash. On the contrary, this RFC assumes that incoming rows will have many conflicts by design, and hopes to resolve them in MaterializeExecutor. I don't think it's a good idea to encourage users to do this, and it's not worth optimizing for.

Also, as the proposal of it, I am not very confident that checking in MaterializeExecutor is a good idea. Maybe it will become a painpoint at some point and maybe users will ask us to remove it. I don't really recommend relying on it much.

Also, as the proposal of it, I am not very confident that checking in MaterializeExecutor is a good idea. Maybe it will become a painpoint at some point and maybe users will ask us to remove it. I don't really recommend relying on it much.

We can just treat the check in the MaterializeExecutor as a pluggable feature. Now it is implemented as a flag in MaterializeExecutor, so if users ask us to remove it we can remove it easily.

As I understand it, this is similar to Apache Doris' pre-aggregate data model, where the data of the same PK can be automatically merged according to some custom rules. However, as one of its selling points, Doris has a specially designed storage to do this pre-aggregation efficiently, but we don't have it.

https://doris.apache.org/docs/data-table/data-model/#aggregate-model just supports some limited behaviors.

AggregationTypecurrently has the following four ways of aggregation: SUM: Sum, multi-line Value accumulation. REPLACE: Instead, Values in the next batch of data will replace Values in rows previously imported. MAX: Keep the maximum. MIN: Keep the minimum.

In fact, if we just want to support this behavior, RisingWave has already been able to express it with a Create Materialized view with an aggregate query.
The on conflicts can take much more expressiveness to users.

twocode · 2023-02-14T05:28:01Z

rfcs/0048-on-conflict-clause-in-create-table-statement.md

+
+## Summary
+
+allow user to declare the conflict behavior when the newly inserted row break the unique constraint of primary key.


by the way, when do we do constraint checks? every time a row is materialized?

Yes, the logic will be implemented in the materialize executor of the table.

rfcs/0048-on-conflict-clause-in-create-table-statement.md

BugenZhao · 2023-02-15T08:04:18Z

rfcs/0048-on-conflict-clause-in-create-table-statement.md

+
+### Future possibilities
+
+  We can support the `ON CONFLICT` clause in the Insert statement if we support attach the on conflict description for the chunk in `BatchInsert` executor and `StreamDML` executor to let `Materialize` executor know how to handle the conflict. But the on conflict clause is always needed in create table statement for the table with connector.


There's still "stream path" between the DML and Materialize, also Exchange if there's user-specified PK, so eventually it seems we need to attach this "description" to each row, which sounds like an overkill. 🤔

rfcs/0048-on-conflict-clause-in-create-table-statement.md

liurenjie1024 · 2023-02-15T09:31:01Z

rfcs/0048-on-conflict-clause-in-create-table-statement.md

+
+  ```SQL
+    CREATE table t(k int primary key, int cnt) 
+    ON CONFLICT DO UPDATE SET cnt = t.cnt + EXCLUDED.cnt


Why we don't use aggregation call here:

select k, sum(cnt) from t;

It is just a trivial example here to show that this grammar can let user declare their own aggregation logic with SQL. And the loigic could be more complex.

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

rfcs/0048-on-conflict-clause-in-create-table-statement.md

fuyufjh · 2023-02-20T06:16:59Z

rfcs/0048-on-conflict-clause-in-create-table-statement.md

+              [ WHERE condition ]
+```
+
+- **Defined when `CREATE TABLE` instead of `INSERT`**


I think we can consider this as a config for the data from connector. i.e. What command the connector should use to write data - REPLACE or INSERT or anything else.

st1page · 2024-03-01T12:21:57Z

details and description in the RFC is out-of-date and the behavior of the feature is written in risingwavelabs/risingwave-docs#1860

st1page added 6 commits February 2, 2023 11:43

tmp

dc220ef

motivation

781550e

tmp

8c1aa5d

explain the syntax

77e624f

finish

c22b7e6

change name

fe94d0e

xxchan reviewed Feb 7, 2023

View reviewed changes

rfcs/0048-on-conflict-clause-in-create-table-statement.md Outdated Show resolved Hide resolved

Update rfcs/0048-on-conflict-clause-in-create-table-statement.md

c85fc29

Co-authored-by: xxchan <xxchan22f@gmail.com>

xiangjinwu reviewed Feb 8, 2023

View reviewed changes

rfcs/0048-on-conflict-clause-in-create-table-statement.md Outdated Show resolved Hide resolved

Update rfcs/0048-on-conflict-clause-in-create-table-statement.md

45c255f

Co-authored-by: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>

wcy-fdu reviewed Feb 8, 2023

View reviewed changes

fuyufjh reviewed Feb 9, 2023

View reviewed changes

twocode reviewed Feb 14, 2023

View reviewed changes

BugenZhao reviewed Feb 15, 2023

View reviewed changes

liurenjie1024 reviewed Feb 15, 2023

View reviewed changes

st1page and others added 2 commits February 16, 2023 20:01

fix typo

3e55ebf

Update rfcs/0048-on-conflict-clause-in-create-table-statement.md

e5d2fdf

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

fuyufjh reviewed Feb 20, 2023

View reviewed changes

rfcs/0048-on-conflict-clause-in-create-table-statement.md Show resolved Hide resolved

fuyufjh reviewed Feb 20, 2023

View reviewed changes

st1page mentioned this pull request Apr 3, 2023

bug / known issue: inserts may be disordered even on one session risingwavelabs/risingwave#7213

Open

st1page closed this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: on conflict clause #48

RFC: on conflict clause #48

st1page commented Feb 7, 2023

wcy-fdu Feb 8, 2023

st1page Feb 8, 2023

fuyufjh Feb 9, 2023 •

edited

Loading

st1page Feb 13, 2023 •

edited

Loading

st1page Feb 16, 2023

twocode Feb 14, 2023

st1page Feb 14, 2023

BugenZhao Feb 15, 2023 •

edited

Loading

liurenjie1024 Feb 15, 2023

st1page Feb 16, 2023

fuyufjh Feb 20, 2023

st1page commented Mar 1, 2024


		## Summary

		allow user to declare the conflict behavior when the newly inserted row break the unique constraint of primary key.


		### Future possibilities

		We can support the `ON CONFLICT` clause in the Insert statement if we support attach the on conflict description for the chunk in `BatchInsert` executor and `StreamDML` executor to let `Materialize` executor know how to handle the conflict. But the on conflict clause is always needed in create table statement for the table with connector.

RFC: on conflict clause #48

RFC: on conflict clause #48

Conversation

st1page commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fuyufjh Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

st1page Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BugenZhao Feb 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

st1page commented Mar 1, 2024

fuyufjh Feb 9, 2023 •

edited

Loading

st1page Feb 13, 2023 •

edited

Loading

BugenZhao Feb 15, 2023 •

edited

Loading