Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake connector configuration dialog #859

Closed
2 tasks
ryzhyk opened this issue Oct 9, 2023 · 0 comments · Fixed by #881
Closed
2 tasks

Snowflake connector configuration dialog #859

ryzhyk opened this issue Oct 9, 2023 · 0 comments · Fixed by #881
Assignees
Labels
adapters Issues related to the adapters crate Web Console Related to the browser based UI

Comments

@ryzhyk
Copy link
Contributor

ryzhyk commented Oct 9, 2023

The design is very similar to what we did for Debezium, except that this is based on the output Kafka connector, not input.

Example config this connector should produce:

    stream: PREFERRED_VENDOR
    transport:
      name: kafka
      config:
        topic: snowflake.preferred_vendor
    format:
      name: json
      config:
        update_format: snowflake
    max_buffered_records: 1000000

The transport section is just a standard Kafka transport config. The format section specifies the json format, with update_format set to "snowflake" (our normal JSON format config dialog does not allow choosing update format).

Design

  • Add a new Kafka-Snowflake output connector type to WebConsole. Since this is just a Kafka connector, the WebConsole must rely on the update_format field to distinguish it from normal Kafka connectors when listing connectors or when opening a connector edit dialog.
  • The config dialog is based on the Kafka config dialog with the following changes:
    • "New Kafka Output" -> "New Kafka-Snowflake output"
    • "Add a Kafka output" -> "Output to a Snowflake table via a Kafka topic"
    • Details, Server and security tabs - no modifications.
    • Format tab only has one config option: Data format, and the only supported data format is "JSON" (we will add "AVRO" in the future, so maybe we can show it as a disabled option).
@ryzhyk ryzhyk added Web Console Related to the browser based UI adapters Issues related to the adapters crate labels Oct 9, 2023
@ryzhyk ryzhyk added this to the October 24, 2023 milestone Oct 9, 2023
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapters Issues related to the adapters crate Web Console Related to the browser based UI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants