Skip to content

Autonomous self test Big Data Pipeline from Flume > Kafka > Spark > Solr

Notifications You must be signed in to change notification settings

sricheta92/Data-Pipeline

Repository files navigation

Flume->Kafka->Spark->Solr: data-pipeline

This project creates data pipeline which gets data from flume sender and processes using Kafka-> spark and stores in Solr through a single script.

My Image

Assumptions:

Requires Docker

Steps:

Download data-pipeline repository.

Starting the script:

This might take some time (~approx 15-20 minutes) as it downloads all required packages, and starts Kafka, Spark, Solr and Flume

 sh startStreaming.sh <HOST MACHINE IP> or <EC2 INSTANCE NAME> <PEM file> <INPUT FILE TO BE READ>

My Image

My Image

Dashboard URL -  http://<Your mahchine name>:8963/solr

My Image

About

Autonomous self test Big Data Pipeline from Flume > Kafka > Spark > Solr

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published