Building a scalable and robust data pipeline. Extracting data from an API, transforming the data using Pyspark, builiding a ML model on the transformed data to predict the price and finally hosting it on a Website built using Shiny App on R.
The data pipeline triggers every day and ingest data to the database, ML model makes it's predicitions on weekly basis.
- Apache Spark
- Apache Airflow
- Docker compose
- SQlite
- Cassandra
- Jenkins
- Kubernetes
- Shiny Dashboard