Skip to content

Latest commit

 

History

History

sample-pipelines

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

StreamSets Logo

Transformer: Sample Pipelines

This folder contains pipeline templates and samples for Transformer.

The following templates/samples are currently available:

Name Description
Clickstream Analysis on Amazon EMR, Amazon Redshift and Elasticsearch Ingest raw clickstream logs from Amazon S3, perform aggregations and store those on Amazon Redshift and ElasticSearch for analysis
ML - Train NLP Model in PySpark Train a Spark MLlib Logistic Regression model for Natural Language Processing (NLP) using PySpark processor
ML - Train Random Forest Regression Model in Scala Train a Spark MLlib Random Forest Regression model using Scala processor
Slowly Changing Dimension - Type 2 Slowly Changing Dimension - Type 2
Spark ETL To Derive Sales Insights on Azure HDInsight And Power BI Extract raw data and transform it (cleanse and curate) before storing it in multiple destinations for efficient downstream analysis
Tx Retail Inventory - Join Agg Repartition Example using Join, Aggregation and Repartition
Tx Scala UDF Example using Scala to create, register and use a User-Defined Function
Tx Slowly Changing Dimensions - Type 1 Slowly Changing Dimension (SCD) - Type 1

Help

For any queries, questions, comments related to these pipelines reach out on any of these channels:

Chat on Slack

User Group

Ask StreamSets