StreamSets Data Collector Tutorials
The following tutorials demonstrate some StreamSets Data Collector features. Clone this repository to your machine to follow along and get familiar with using Data Collector.
Log Shipping to Elasticsearch - Read weblog files from a local filesystem directory, decorate some of the fields (e.g. GeoIP Lookup), and write them to Elasticsearch.
Creating a Custom StreamSets Origin - Build a simple custom origin that reads a Git repository's commit log and produces the corresponding records.
Creating a Custom StreamSets Processor - Build a simple custom processor that reads metadata tags from image files and writes them to the records as fields.
Creating a Custom StreamSets Destination - Build a simple custom destination that writes batches of records to a webhook.
Ingesting Drifting Data into Hive and Impala - Build a pipeline that handles schema changes in MySQL, creating and altering Hive tables accordingly.
Creating a StreamSets Spark Transformer - Build a simple Spark Transformer that computes a credit card's issuing network from its number.
The Data Collector documentation also includes an extended tutorial that walks through basic Data Collector functionality, including creating, previewing and running a pipeline, and creating alerts.
StreamSets Data Collector and its tutorials are built on open source technologies; the tutorials and accompanying code are licensed with the Apache License 2.0.
We welcome contributors! Please check out our guidelines to get started.