Rick and Morty ELT Pipeline

About • Installation • Dashboard • ELT Diagram • Airflow Graph • Improvements

About

In this data engineering project, my main goal was to build a data pipeline to extract, transform, and load data from the Rick and Morty API. I aimed to create an organized and efficient flow of data using a combination of tools and technologies.

To start, I utilized the Python requests package to fetch data from the Rick and Morty API. This marked the beginning of the data extraction process, where I collected information about characters, episodes, and locations.

For data storage, I took a modern approach by implementing a Delta Lake using the delta-rs package along with the Medallion architecture. This cutting-edge tool allowed me to manage structured and semi-structured data effectively without relying on Apache Spark. The result was a lightweight, yet powerful storage solution for anything less then big data.

In the transformation phase, I turned to the pandas library. This step involved shaping the extracted data into a more manageable format. Through a series of data cleaning, filtering, and structuring operations, I harnessed the flexibility of pandas DataFrames.

Apache Airflow, played a role in orchestrating the entire process. Airflow enabled me to schedule and automate data extraction, transformation, and loading.

Using Docker, I containerized the project. This step allowed me to encapsulate the entire workflow and its dependencies, making it easily portable across different environments. Containerization ensured consistency and eliminated potential compatibility issues, making deployment a breeze.

In essence, this project showcases my ability to seamlessly gather, store, transform, and orchestrate data using a well-chosen set of tools. From extracting data through APIs to utilizing advanced storage techniques, employing data transformation libraries, orchestrating tasks with Airflow, and finally containerizing the project, every step reflects a strategic approach to building a robust and efficient data pipeline, even if only on a small scale.

ELT Diagram

Airflow Graph

Key Technologies:

Python
Apache Airflow
Delta-rs: Data storage layer with ACID transactions and versioning
Docker: Containerization platform for easy deployment and reproducibility
Tableau: Visual analytics platform

Application services at runtime:

One airflow worker
One airflow scheduler
One airflow triggerer
One airflow webserver
Redis
Postgres

Installation

Download Docker Desktop and start docker
Clone Repo

git clone https://github.com/jgrove90/rick-and-morty-deltalake.git

Run start.sh

sh start.sh

Access application services via the web browser:
- Airflow UI - http://localhost:8080/
Run teardown.sh to remove application from system including docker images

sh teardown.sh

Improvements

Data validation can be done prior to loading the delta tables at each layer of the pipeline. This data was very clean and only needed to perform simple transformations. As a proof of concept I might return to this project and include a framework like:

I'd probably go with Soda Core or Pandera as they are lightweight frameworks compared to Great Expectations.

Finally, a more indepth statistical analysis could be performed using:

Jupyter Lab
Dashboards (might revisit this and make it more visually appealing)

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
airflow/dags		airflow/dags
img		img
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt
start.sh		start.sh
teardown.sh		teardown.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rick and Morty ELT Pipeline

About

ELT Diagram

Airflow Graph

Key Technologies:

Application services at runtime:

Installation

Improvements

About

Languages

jgrove90/rick-and-morty-deltalake

Folders and files

Latest commit

History

Repository files navigation

Rick and Morty ELT Pipeline

About

ELT Diagram

Airflow Graph

Key Technologies:

Application services at runtime:

Installation

Improvements

About

Topics

Resources

Stars

Watchers

Forks

Languages