Name		Name	Last commit message	Last commit date
parent directory ..
pipelines		pipelines
README.md		README.md

README.md

Transformer: Sample Pipelines

This folder contains pipeline templates and samples for Transformer.

The following templates/samples are currently available:

Name	Description
Clickstream Analysis on Amazon EMR, Amazon Redshift and Elasticsearch	Ingest raw clickstream logs from Amazon S3, perform aggregations and store those on Amazon Redshift and ElasticSearch for analysis
ML - Train NLP Model in PySpark	Train a Spark MLlib Logistic Regression model for Natural Language Processing (NLP) using PySpark processor
ML - Train Random Forest Regression Model in Scala	Train a Spark MLlib Random Forest Regression model using Scala processor
Slowly Changing Dimension - Type 2	Slowly Changing Dimension - Type 2
Spark ETL To Derive Sales Insights on Azure HDInsight And Power BI	Extract raw data and transform it (cleanse and curate) before storing it in multiple destinations for efficient downstream analysis
Tx Retail Inventory - Join Agg Repartition	Example using Join, Aggregation and Repartition
Tx Scala UDF	Example using Scala to create, register and use a User-Defined Function
Tx Slowly Changing Dimensions - Type 1	Slowly Changing Dimension (SCD) - Type 1

Help

For any queries, questions, comments related to these pipelines reach out on any of these channels: