Skip to content

Latest commit

 

History

History

sample-pipelines

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

StreamSets Logo

Data Collector: Sample Pipelines

This folder contains pipeline templates and samples for StreamSets Data Collector.

The following templates/samples are currently available:

Name Description
Citi Bike real-time system data (Basic) Reads from Rest API with unstructured and hierarchical data and convert to relational format
Date Conversions Convert dates from string to various datetime formats and timezones using Field Type Converter and Expression Evaluator processors
Drift Synchronization for Hive Drift Synchronization from MySQL to the Cloudera distribution of Apache Hive and Apache Impala
Hadoop FS to ADLS Gen2 Load data from Hadoop FS to ADLS Gen 2 by performing some transformations
ML - TensorFlow Binary Classification Load a pre-trained TensorFlow model to classify cancer condition as either benign or malignant
MySQL CDC to Delta Lake Reads MySQL change data capture (CDC) data and writes to Databricks Delta Lake
MySQL CDC to S3 to Snowflake Reads MySQL change data capture (CDC) data, writes to S3 then reads from S3 and writes to Snowflake
MySQL CDC to Snowflake Reads MySQL change data capture (CDC) data and writes to Snowflake
MySQL Schema Replication to Azure Synapse SQL Bulk load data from MySQL into Azure Synapse SQL
MySQL Schema replication to Delta Lake Bulk load data from MySQL into Databricks Delta Lake
MySQL binlog to DeltaLake Reads MySQL binlog changed data and writes to Databricks Delta Lake
NYC Taxi Ride Payment Type (Basic) Reads data from a directory, process it, route it, mask sensitive data and write into another file system with a different data format
NYC Taxi Ride Payment Type (with Jython) Reads data from a directory, process it using Jython, route it, mask sensitive data and write into another file system with a different data format
Oracle 19c Bulk Ingest and CDC to Databricks Delta Lake Bulk ingest data from Oracle 19c and process Change Data Capture (CDC) into Databricks Delta Lake
Oracle CDC to Delta Lake Reads change data capture (CDC) data Oracle and writes to Databricks Delta Lake
Oracle CDC to Snowflake Reads change data capture (CDC) data Oracle and writes to Snowflake
Parse Twitter Data to JSON Parse raw Twitter data and store curated data in JSON format
Parse Web Logs to JSON and Avro Parse raw web logs ingested in Common Log Format and store curated data in JSON and Avro formats
PostgreSQL CDC to Delta Lake Reads change data capture (CDC) data from PostgreSQL and writes to Databricks Delta Lake
PostgreSQL CDC to Snowflake Reads change data capture (CDC) data from PostgreSQL and writes to Snowflake
SQLServer CDC to Delta Lake Reads change data capture (CDC) data from SQL Server and writes to Databricks Delta Lake
SQLServer CDC to Snowflake Reads change data capture (CDC) data from SQL Server and writes to Snowflake
Salesforce CDC to Delta Lake Reads change data capture (CDC) data from Salesforce and writes to Databricks Delta Lake
Salesforce CDC to Snowflake Reads change data capture (CDC) data from Salesforce and writes to Snowflake
Salesforce to Delta Lake Bulk load data from Salesforce accounts into Databricks Delta Lake
Working with XML (Basic) Read and process XML data in Data Collector
aws-marketplace-reports Bulk load data from Salesforce accounts into Databricks Delta Lake

Help

For any queries, questions, comments related to these pipelines reach out on any of these channels:

Chat on Slack

User Group

Ask StreamSets