SequenceIQ Hadoop sample projects
This repository is a collection of sample projects and code examples featured in our blog entries - for more details check SequenceIQ blog. This samples repository and the blog contains random thoughts and proof-of-concepts/interesting issues we have face during our product development stack.
Where the samples are not covered by a blog entry, we try to make them self explanatory or supply a short readme. Please feel free to collaborate, share, ask for help or report issues.
- flume-sources module: Custom Apache Flume source
- etl-samples module: ETL - producing better quality data
- hdp-sandbox-access module: Accessing HDP2 sandbox from the host
- lastfm-morphlines-etl module: How-to: Process Data using Morphlines (in Kite SDK)
- hdp-sandbox-access module: HDFS and java.nio.channels
- mapreduce-morphline module: Data cleaning with MapReduce and Morphlines
- yarn-queue-tests module: YARN Capacity Scheduler
- tez-dag-jobs module: Using Mahout with Tez
- yarn-monitoring-R module: Monitoring YARN with R
- scalding-correlation module: Correlation example with Scalding
- spark-clustering module: K-means clustering on Spark