-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowflake integration experiment #774
Comments
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Completed via #868 |
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
ryzhyk
pushed a commit
that referenced
this issue
Oct 11, 2023
We use the Snowflake Kafka connector with Snowpipe Streaming to send a stream of output changes from Feldera into Snowflake. The main challenge is that Snowpipe currently supports inserts but not updates or deletions. The workaround is to write all updates into a set of landing tables that mirror the target database schema and use a combination of snowflake streams to incrementally apply these updates to target tables by converting them into insert/update/delete commands. So the end-to-end process is: * Feldera outputs updates to Kafka * The Snowflake Kafka connector converts them into a stream of inserts into landing tables. * We attach a Snowpipe Stream to each landing table to track changes to the table. A periodic task reads updates from the stream, applies them to the target table and removes them from the landing tables. At the moment the landing tables and the data ingestion logic (Snowflake streams and tasks) must be written by the user, but they can in principle be automatically generated. TODO: - Docs (#867) - WebConsole support (#859) - Support Snowflake's `TIMESTAMP` format (#862) - Figure out how to apply multiple updates atomically (See: snowflakedb/snowflake-kafka-connector#717) - Test under load. - Automate the generation of landing tables and data ingestion tasks. - Figure out downtime and schema evolution ops. Addresses #774 Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Use the Snowflake Kafka connector to send Feldera output to Snowflake via Snowpipe, ideally using the Snowpipe streaming API. This experiment will require us to implement minimal Snowflake support and inform the design of a more complete Snowflake integration.
The text was updated successfully, but these errors were encountered: