Name		Name	Last commit message	Last commit date
parent directory ..
images		images
MySQL_CDC_to_Snowflake.zip		MySQL_CDC_to_Snowflake.zip
README.md		README.md

README.md

MySQL CDC to Snowflake

Important: These instructions assume you have access to StreamSets Data Collector (v3.15+) and have performed all the prerequisites for MySQL and Snowflake

For help installing StreamSets Data Collector, see StreamSets Data Collector Installation.
For help with MySQL Binary Log prerequisites, see MySQL Binary Log.
For help with Snowflake prerequisites, see Snowflake.

Here is a link to a short video on using this pipeline template: Video Link

OVERVIEW

This pipeline demonstrates how to read change data capture (CDC) data from a MySQL database and replicate the changes to Snowflake.

Disclaimer: This pipeline is meant to serve as a template for performing MySQL binlog CDC to Snowflake. Some of the parameters, tables and fields may be different for your environment and may need additional customizations. Please consult the StreamSets documentation (linked below) for full information on configuration of each stage used below.

USING THE TEMPLATE

NOTE: Templates are supported in StreamSets Control Hub. If you do not have Control Hub, you can import the template pipeline in Data Collector but will need to do that each time you want to use the template.

PIPELINE

DOCUMENTATION

Snowflake Destination

STEP-BY-STEP

Step 1: Download the pipeline

Click Here to download the pipeline and save it to your drive.

Step 2: Import the pipeline

Click the down arrow next to the "Create New Pipeline" and select "Import Pipeline From Archive".

Click "Browse" and locate the pipeline file you just downloaded, click "OK", then click "Import"

Step 3: Configure the parameters

Click on the pipeline you just imported to open it and click on the "Parameters" tab and fill in the appropriate information for your environment.

Important: The pipeline template uses the most common default settings for things like the Snowflake region, staging location, etc. All of these are configurable and if you need to change those, you can opt to not use the built-in parameters and choose the appropriate settings yourself. Please refer to the documentation listed in this document for all the available options.

The following parameters are set up for this pipeline:

`mysql_hostname`	MySQL server hostname.
`mysql_port`	MySQL server port.
`mysql_serverid`	Replication server ID that the origin uses to connect to the master MySQL server. Must be unique from the server ID of the replication master and of all the other replication slaves. When the MySQL server database is enabled for GTID, the server ID is optional.
`mysql_username`	MySQL username. The user must have the following MySQL privileges: REPLICATION CLIENT REPLICATION SLAVE
`mysql_password`	MySQL password. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
`snowflake_account`	Snowflake account name.
`snowflake_user`	Snowflake user name.
`snowflake_password`	Snowflake password.
`snowflake_warehouse`	Snowflake warehouse.
`snowflake_database`	Snowflake database.
`snowflake_schema`	Snowflake schema.
`snowflake_stage_name`	Name of the Snowflake stage used to stage the data. Unless using a Snowflake internal user stage, you create this stage as part of the Snowflake prerequisite tasks. To use a Snowflake internal user stage, enter a tilde (`~`).

Step 4: Run the pipeline

Click the "START" button to run the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MySQL CDC to Snowflake

MySQL CDC to Snowflake

images

images