Apache Airflow sample setup with Kubernetes.
Spins up a local Kubernetes cluster for Airflow with Kind:
- Leverages Airflow's official Helm chart.
- Uses a PostgreSQL database for Airflow.
- Uses Kubernetes Executor to run each Airflow task on an isolated pod.
This is meant as a sample local setup. In order to run it in a production environment, please refer to the Airflow Helm chart production guide.
Kind, Docker and Helm for local Kubernetes cluster.
The repo includes a Makefile
. You can run make help
to see usage.
Basic setup:
- Run
make k8s-cluster-up
to spin up local Kubernetes cluster with Kind. - Run
make airflow-k8s-add-helm-chart
to add the official Airflow Helm chart to the local repo. - Run
make airflow-k8s-create-namespace
to create a namespace for the Airflow deployment. - Run
make airflow-k8s-up
to deploy Airflow on the local Kubernetes cluster. - On a separate terminal, run
make airflow-webserver-port-forward
to be able to access the Airflow webserver on http://localhost:8080.
The credentials for the webserver are admin/admin.
If you need to customize the Airflow deployment you can edit values.yaml accordingly.
if you need to tune Airflow configuration you can add the corresponding environment variables in the env
section of values.yaml
.
The default values.yaml
of the source Helm chart can be seen here.
DAGs are deployed via GitSync.
GitSync acts as a side car container alongside the other Airflow pods, synchronising the dags/
folder in the pods with the DAGs located in a Git repo of your choice (in this case https://github.com/guidok91/airflow/tree/master/dags).
A custom Docker image is provided for the pods. Here we can install the Airflow dependencies we need.
So that Airflow logs don't get lost every time a task finishes (e.g. the pod gets deleted), the setup provides a PersistentVolume that shares the logs with the host system in the data/
folder.