Skip to content

Simple data-pipeline, depoyment using containerization with Kubenetes and Helm

Notifications You must be signed in to change notification settings

hieuung/Simple-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple-data-pipeline

Simple data pipeline, deployment using containerization with Kubernetes and Helm for ETL learning

Tech stack

  • Containerization (docker, kubernetes, minikube, helm)
  • Airflow
  • SQL (Postgres)

Descriptions

This pipeline extracts data from the production database (from the project Simple web-app) transforms and loads it into a data sink using Python, SQL(Postgres), and Airflow for job scheduling.

Deployment

  • Install docker, kubernetes, minikube, and helm

  • Clone and build project Simple web-app following instructions.

  • Clone this project to local.

  • (Optional) Build your own Airflow image (build your own dags) using. Dockerfile provided

  • Deploy Airflow on minikube using built Airflow image (currently my Airflow image)

cd `path_to_this_repo`/hieu_airflow/deployment

helm repo add apache-airflow https://airflow.apache.org
    helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace

helm upgrade -f values.yaml airflow apache-airflow/airflow --namespace airflow

Result

  • Verify deployment, service
kubectl get depolyment -n airflow
kubectl get service -n airflow

NOTE: Using minkibe tunnel if LoadBalancer not exposes External-IP

About

Simple data-pipeline, depoyment using containerization with Kubenetes and Helm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published