Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake integration experiment #774

Closed
2 tasks done
ryzhyk opened this issue Sep 25, 2023 · 1 comment
Closed
2 tasks done

Snowflake integration experiment #774

ryzhyk opened this issue Sep 25, 2023 · 1 comment
Assignees
Labels
adapters Issues related to the adapters crate

Comments

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 25, 2023

Use the Snowflake Kafka connector to send Feldera output to Snowflake via Snowpipe, ideally using the Snowpipe streaming API. This experiment will require us to implement minimal Snowflake support and inform the design of a more complete Snowflake integration.

  • Implement Feldera Snowflake connector
  • Create a Feldera->Snowflake demo
@ryzhyk ryzhyk added the adapters Issues related to the adapters crate label Sep 25, 2023
@ryzhyk ryzhyk added this to the v0.1.5 milestone Sep 25, 2023
@ryzhyk ryzhyk self-assigned this Sep 25, 2023
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
@ryzhyk
Copy link
Contributor Author

ryzhyk commented Oct 11, 2023

Completed via #868

@ryzhyk ryzhyk closed this as completed Oct 11, 2023
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk pushed a commit that referenced this issue Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send
a stream of output changes from Feldera into Snowflake.  The main
challenge is that Snowpipe currently supports inserts but not updates
or deletions.  The workaround is to write all updates into a set of
landing tables that mirror the target database schema and use a
combination of snowflake streams to incrementally apply these updates
to target tables by converting them into insert/update/delete commands.

So the end-to-end process is:

* Feldera outputs updates to Kafka
* The Snowflake Kafka connector converts them into a stream of inserts
  into landing tables.
* We attach a Snowpipe Stream to each landing table to track changes
  to the table.  A periodic task reads updates from the stream,
  applies them to the target table and removes them from the landing
  tables.

At the moment the landing tables and the data ingestion logic (Snowflake
streams and tasks) must be written by the user, but they can in
principle be automatically generated.

TODO:
- Docs (#867)
- WebConsole support (#859)
- Support Snowflake's `TIMESTAMP` format (#862)
- Figure out how to apply multiple updates atomically
  (See: snowflakedb/snowflake-kafka-connector#717)
- Test under load.
- Automate the generation of landing tables and data ingestion tasks.
- Figure out downtime and schema evolution ops.

Addresses #774

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapters Issues related to the adapters crate
Projects
None yet
Development

No branches or pull requests

1 participant