In this project, I created a data pipeline using Airflow. The pipeline will download podcast episodes. I stored our results in a Postgres database that we can easily query.
Project Steps
- Download the podcast metadata xml and parse
- Create a Postgres database to hold podcast metadata
File overview:
podcast_summary.py
- the code to create a data pipelinedockerfile
- Docker file for Airflow Imagedocker-compose.yml
- service container which we are use
- Docker Dekstop
- Vscode
- Python
podcast_summary.py
dockerfile
docker-compose.yml
mkdir podcast
cd podcast
mkdir dags logs plugin
docker build -t airflow-podcast .
docker-compose up -d
http://localhost:8080
Locate the podcast_summary
DAG in the Airflow UI and trigger it to start the podcast data extraction, transformation, and loading process.
For questions or feedback, please create issue or contact riteshojha2002@gmail.com.