Build Streaming Pipeline using Akka, Kafka and Spark.

Course Project

This was our course project for CS 441 taken under the guidance of Prof. Mark Grechanik.

Project Members

Name	UIN
Aditya Kanthale	663141759
Parth Deshpande	657711378
Jeet Mehta	668581235
Sharath Bhargav	663652557
Harsh Mishra	653554247

Overview:

In this project we have created a streaming data pipeline using various cloud technologies alongside implementing an actor-model such as Akka that helped us ingesting the log file which is generated in real time. This file is delivered via an event-based service called Kafka and sent to Spark for further processing. In the Spark pipeline aggregations are performed for the log type WARN and ERROR that sends this information to stakeholders using the AWS Email Service. This entire pipelined output is plotted in real time which uses data from the kafka stream. Detailed end to end implementation along with setups of each of the AWS services used is attached in this playlist.

Task 1:

Deploying Log generating application over EBS

We deploy multiple instances of log generating application over AWS Elastic Beanstalk (EBS). Detailed implementation & steps to execute are present in LogFileGenerator.

Task 2:

Real-time file monitoring

In this we monitor and extract real-time changes to the files present in a specified directory using Akka Actors. Here is the code and detailed documentation FileWatcher.

Task 3:

Amazon Managed Streaming Kafka:

This amazon service will help us capture events and stream the log file and send it further to Spark for processing. Steps to execute the Amazon MSK is provided in Kafka MSK .

Task 4:

Spark:

Spark's functionality is to process logs in realtime and analyse patterns that emerge over time. This application receives continuous stream of logs. These are received from a Kafka topic. When the threshold for the log type WARN and ERROR are reached, an email is triggered using the AWS Email Service. Detailed explanation with steps are provided in Spark.

Task 5:

Visualization

This is the frontend module of the project where data visualization would be done. The aggregated results received from Spark are written to another Kafka Topic. Then, dynamic plots are drawn using data from the kafka stream. Implementation and thorough explanation is provided in the frontend repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
FileWatcherScala-CS441-		FileWatcherScala-CS441-
Kafka_MSK		Kafka_MSK
LogFileGenerator		LogFileGenerator
frontend		frontend
spark_app2		spark_app2
.gitignore		.gitignore
Readme.md		Readme.md
feedback.txt		feedback.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Build Streaming Pipeline using Akka, Kafka and Spark.

Course Project

Overview:

Task 1:

Task 2:

Task 3:

Task 4:

Task 5:

Developed with ❤️ by Aditya Kanthale, Parth Deshpande, Jeet Mehta, Sharath Bhargav and Harsh Mishra.

About

Releases

Packages

Contributors 4

Languages

sharathbhargav/CS441-Project

Folders and files

Latest commit

History

Repository files navigation

Build Streaming Pipeline using Akka, Kafka and Spark.

Course Project

Overview:

Task 1:

Task 2:

Task 3:

Task 4:

Task 5:

Developed with ❤️ by Aditya Kanthale, Parth Deshpande, Jeet Mehta, Sharath Bhargav and Harsh Mishra.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages