This project creates data pipeline which gets data from flume sender and processes using Kafka-> spark and stores in Solr through a single script.
Assumptions:
Requires Docker
Steps:
Download data-pipeline repository.
Starting the script:
This might take some time (~approx 15-20 minutes) as it downloads all required packages, and starts Kafka, Spark, Solr and Flume
sh startStreaming.sh <HOST MACHINE IP> or <EC2 INSTANCE NAME> <PEM file> <INPUT FILE TO BE READ>
Dashboard URL - http://<Your mahchine name>:8963/solr