Skip to content

sushithks/Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building an Efficient ETL Pipeline with Apache Airflow, Docker, and PostgreSQL

Architecture Breakdown:

The architecture of the ETL pipeline involves the following components:

Dockerized Apache Airflow: Ensures isolation and reproducibility of the workflow environment. Python Scripts for Data Transformation: Utilizes pandas and other libraries to process data efficiently. PostgreSQL as the Data Warehouse: Where transformed data is stored for querying and reporting.

image

If you're using a Windows system, make sure that virtualization is enabled in the BIOS settings. Without this setting enabled, Docker will not work as expected because Docker relies on virtualization technology to run containers.

Importnet Link : https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html

Here You can find the detailed explanation of the Pipeline: Medium post

About

Data Engineering projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages