Simple data pipeline, deployment using containerization with Kubernetes and Helm for ETL learning
- Containerization (docker, kubernetes, minikube, helm)
- Airflow
- SQL (Postgres)
This pipeline extracts data from the production database (from the project Simple web-app) transforms and loads it into a data sink using Python, SQL(Postgres), and Airflow for job scheduling.
-
Install docker, kubernetes, minikube, and helm
-
Clone and build project Simple web-app following instructions.
-
Clone this project to local.
-
(Optional) Build your own Airflow image (build your own dags) using. Dockerfile provided
-
Deploy Airflow on minikube using built Airflow image (currently my Airflow image)
cd `path_to_this_repo`/hieu_airflow/deployment
helm repo add apache-airflow https://airflow.apache.org
helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace
helm upgrade -f values.yaml airflow apache-airflow/airflow --namespace airflow
- Verify deployment, service
kubectl get depolyment -n airflow
kubectl get service -n airflow
NOTE: Using
minkibe tunnel
if LoadBalancer not exposes External-IP