Skip to content

This project demonstrates all of the technologies needed to create an end-to-end data science pipeline. This includes consuming data from an original source, processing and storing it and finally providing machine-learning based results to end users.

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

Data Science Pipeline

This project demonstrates all of the technologies needed to create an end-to-end data science pipeline. This includes consuming data from an original source, processing and storing it and finally providing machine-learning based results to end users.

Technologies Used

This data science pipeline consists of the following applications and services:

  • Amazon AWS - I used Amazon DynamoDB, Lambda, and Gateway API products to do calculations on a large database, stored in MongoDB, and provide a GET Endpoint.

  • Angular 8 - An Angular 8 front end to provide a dashboard demonstrating all of data collection and machine learning results.

  • Bash Scripts - Linux Bash Scripts were needed to streamline the start-up processes for all services needed in this project.

  • Docker Compose - I used Docker Compose to connect microservice containers for the final product.

  • Kafka - To allow for multiple microservices and an HDFS-based Data Lake to consume live tweets, I set up a Kafka server.

  • Spring Boot REST Service - To authenticate users and store their data, I used a Sprint Boot REST Web Service backend.

  • Twitter Stream Consumer - Python-based consumtion of tweets using the tweepy library.

  • Flask Machine Learning Service - Flask-based application to provide machine learning results.

About

This project demonstrates all of the technologies needed to create an end-to-end data science pipeline. This includes consuming data from an original source, processing and storing it and finally providing machine-learning based results to end users.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published