Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add on conflict clause in create table statement #1860

Merged
merged 21 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/sql/commands/sql-create-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ CREATE TABLE [ IF NOT EXISTS ] table_name (
[ watermark_clause ]
)
[ APPEND ONLY ]
[ ON CONFLICT conflict_action ]
[ WITH (
connector='connector_name',
connector_parameter='value', ...)]
Expand Down Expand Up @@ -75,13 +76,29 @@ FORMAT upsert ENCODE AVRO (
|`generation_expression`| The expression for the generated column. For details about generated columns, see [Generated columns](/sql/query-syntax/query-syntax-generated-columns.md).|
|`watermark_clause`| A clause that defines the watermark for a timestamp column. The syntax is `WATERMARK FOR column_name as expr`. For the watermark clause to be valid, the table must be an append-only table. That is, the `APPEND ONLY` option must be specified. This restriction only applies to a table. For details about watermarks, refer to [Watermarks](/transform/watermarks.md).|
|`APPEND ONLY` | When this option is specified, the table will be created as an append-only table. An append-only table cannot have primary keys. `UPDATE` and `DELETE` statements are not valid for append-only tables. Note that append-only tables is a Beta feature. |
|`ON CONFLICT` | Specify the alternative action when the newly inserted record bring a violation of PRIMARY KEY constraint on the table. See Section "PK Conflict Behavior" below for more information. |
|**WITH** clause |Specify the connector settings here if trying to store all the source data. See the [Data ingestion](/ingest/data-ingestion.md) page for the full list of supported source as well as links to specific connector pages detailing the syntax for each source. |
|**FORMAT** and **ENCODE** options |Specify the data format and the encoding format of the source data. To learn about the supported data formats, see [Data formats](sql-create-source.md#supported-formats). |

## Watermarks

RisingWave supports generating watermarks when creating an append-only streaming table. Watermarks are like markers or signals that track the progress of event time, allowing you to process events within their corresponding time windows. For more information on the syntax on how to create a watermark, see [Watermarks](/transform/watermarks.md).

## PK Conflict Behavior
The record with "insert" operation could introduce duplicate records with the same primary key in the table. In that case, an alternative action Specified by the `ON CONFLICT` clause will be adopted. The record can come from Insert DML statement, table's external connectors or sinks into the table([`CREATE SINK INTO`](sql-create-sink-into.md)).

The option could one of the belows
- `DO NOTHING`: ignore the newly inserted record.
- `DO UPDATE FULL`: replace the exist row in the table.
- `DO UPDATE IF NOT NULL`: only replace those fields which is not NULL in the inserted row

:::note
The "delete" and "update" operation on the table can not break the primary key constraint on the table. So the option will not take effect for those cases.
:::

:::note
When `DO UPDATE IF NOT NULL` behavior is applied, `DEFAULT` clause is not allowed on the table's columns.
:::
## Examples

The statement below creates a table that has three columns.
Expand Down
85 changes: 85 additions & 0 deletions docs/transform/multiple-table-sink.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
id: multiple-table-sink
slug: /multiple-table-sink
title: Multiple Table Sink
---
<head>
<link rel="canonical" href="https://docs.risingwave.com/docs/current/multiple-table-sink/" />
</head>

This guide tells a way to maintain a wide table whose columns are comes from different sources. Traditional data warehouse or ETL use a join query to do it. But the streaming join brings issues such as low efficiency and large memory consumption.

In some cases with limitation, use the [CREATE SINK INTO TABLE](/commands/sql-create-sink-into.md) and [ON CONFLICT clause](/commands/sql-create-table.md#pk-conflict-behavior) can save the resources and get high efficiency.

## Merge multiple sinks with the same primary key

:::note
Keep in mind that the `ON CONFLICT` clause does not affect the update or delete events, the sinks should be forced to be append-only, otherwise, the delete or update events from any sink will delete the regarding row.
:::

```SQL
CREATE TABLE d1(v1 int, k int primary key);
CREATE TABLE d2(v2 int, k int primary key);
CREATE TABLE d3(v3 int, k int primary key);
CREATE TABLE wide_d(v1 int, v2 int, v3 int, k primary key)
st1page marked this conversation as resolved.
Show resolved Hide resolved
st1page marked this conversation as resolved.
Show resolved Hide resolved
ON CONFLICT DO UPDATE IF NOT NULL;

CREATE SINK sink1 INTO wide_d AS SELECT v1, NULL, NULL, k FROM d1
with (
type = 'append-only',
force_append_only = 'true',
);
st1page marked this conversation as resolved.
Show resolved Hide resolved
CREATE SINK sink2 INTO wide_d AS SELECT NULL, v2, NULL, k FROM d2
with (
type = 'append-only',
force_append_only = 'true',
);
CREATE SINK sink3 INTO wide_d AS SELECT NULL, NULL, v3, k FROM d3
with (
type = 'append-only',
force_append_only = 'true',
);
```

## Enrich data with foreign keys in Star/Snowflake Schema Model

With star schema, the data is constructed with a central fact table surrounded by several related dimension tables. Each dimension table is joined to the fact table through a foreign key relationship. With the good properties that the join key is the primary key of the dimension tables, we can rewrite the query as a series of sink into table.

```sql
CREATE TABLE fact(pk int primary key, k1 int, k2 int, k3 int);
CREATE TABLE d1(pk int primary key, v int);
CREATE TABLE d2(pk int primary key, v int);
CREATE TABLE d3(pk int primary key, v int);

CREATE TABLE wide_fact(pk int primary key, v1 int, v2 int, v3 int)
ON CONFLICT DO UPDATE IF NOT NULL;

/* the main sink is not force-append-only to control if the record exists*/
CREATE SINK fact_sink INTO wide_fact AS
SELECT pk, NULL, NULL, NULL
FROM fact;

CREATE SINK sink1 INTO wide_fact AS
SELECT fact.pk, d1.v, NULL, NULL
FROM fact JOIN d1 ON fact.k1 = d1.pk
with (
type = 'append-only',
force_append_only = 'true',
);

CREATE SINK sink2 INTO wide_fact AS
SELECT fact.pk, NULL, d2.v, NULL
FROM fact JOIN d2 ON fact.k2 = d2.pk
with (
type = 'append-only',
force_append_only = 'true',
);

CREATE SINK sink3 INTO wide_fact AS
SELECT fact.pk, NULL NULL, d3.v
FROM fact JOIN d3 ON fact.k3 = d3.pk
with (
type = 'append-only',
force_append_only = 'true',
);
```
5 changes: 5 additions & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,11 @@ const sidebars = {
id: "transform/use-dbt",
label: "Use dbt for data transformations",
},
{
type: "doc",
id: "transform/multiple-table-sink",
label: "Maintain wide table with table sinks",
},
],
},
{
Expand Down
Loading