This is a simple repo to work with Apache Airflow.
export AIRFLOW_HOME=~/airflow
AIRFLOW_VERSION=2.9.0
PYTHON_VERSION="$(python -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
Please note you may don't have python
but python3
in your system. In that case, you need to replace python
with python3
in the above command.
airflow version
airflow info
airflow db check
Add dags folder to ~/airflow/airflow.cfg
First, check the current airflow.cfg
file. If it's empty, you can add the following lines to it.
[core]
dags_folder = /path/to/this/repo/dags
Please note that you need to replace /path/to/this/repo
with the actual absolute path to this repo. You can get the absolute path by running the following command in the terminal.
pwd
airflow standalone
You should see the credentials for the default user. You can change the password for the default user by editing the standalone_admin_password.txt
file.
Open a browser and go to http://localhost:8080/
You should see the Airflow UI with all the example dags and your dags, too!
To hide the example dags, you can add the following line to the airflow.cfg
file.
[core]
load_examples = False
Then you need to reset the database.
airflow db reset
For running the Kubernetes DAG locally, you need to install kubectl
and minikube
.
You can use this link to install kubectl
.
You can use this link to install minikube
.
After installing kubectl
and minikube
, you can start the minikube cluster by running the following command.
minikube start
Then you should see the minikube cluster running by running the following command.
kubectl get nodes
or
kubectl cluster-info
Now you need to install the Airflow provider for Kubernetes.
pip install apache-airflow-providers-cncf-kubernetes
For more information, you can check this link.
Now you can run the Kubernetes DAG by going to the Airflow UI and turning on the kubernetes
DAG. Every time you run the DAG, it will create a new pod in the minikube cluster and run the task in that pod. The pod will be deleted after the task is done. To catch the logs of the pod, you can use the following command.
watch kubectl get pods -A