Abstract

Short Data Engineering project, to ensure some skills are mastered before moving on to harder tasks

Python is the main language, used with multiple librairies like FastAPI, Spark and Streamlit

The project goal is to generate data, make it available through an API. Then extract it, transform it and load it in a webapp that allows to visualize and analyze the data we produced in the first step.

Diagram

Guide

First set up a virtual environment :

python3 -m venv venv/

Then, install the requirements :

pip install -r requirements.txt

If you want to start the API server (assuming you're in sensor_api folder) :

uvicorn main:app --reload

If you want to start airflow server (assuming you're in airflow folder) :

export AIRFLOW_HOME=$(pwd) && airflow standalone

Connect to localhost:8080, connect with the credentials given by the previous command

You should see this DAG, this is the DAG that run extraction and transformation, run it everytime you want, otherwise it will launch itself hourly

If you want to start the streamlit webapp (assuming you're in streamlit_app folder) :

streamlit run main.py

Apache Parquet

Some precisions about Apache Parquet :

Column-oriented
Efficiently compressed
fast OLAP request
More efficient than CSV, especially to reduce cloud cost

Next steps ?

If I had to enhance this project, I would use a cloud service like AWS for file storage and Spark computations, add more options and pages in the web app to add more ways to explore the data, improve the API throughput and maximum load

Project idea and tickets by Benjamin Dubreu - Data Upskilling

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
airflow		airflow
data_transformer		data_transformer
extractor_consumer		extractor_consumer
schema		schema
sensor_api		sensor_api
streamlit_app		streamlit_app
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Diagram

Guide

Apache Parquet

Next steps ?

About

Releases

Packages

Languages

lucabresolin/DEProject

Folders and files

Latest commit

History

Repository files navigation

Abstract

Diagram

Guide

Apache Parquet

Next steps ?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages