# Airflow

Airflow is a tool that allows to schedule a set of processes commonly used in ETL pipelines and sometimes in ML automation. This section considers typical ways to use Airflow.

We typically are typically run airflow in the Docker container to make sure that we are working in a clean environment. Use following command to run the container.

Build the docker image described in the `airflow_files/dockerfile` and run it with the `standalone` command.

```bash
docker build -f packages/airflow_files/dockerfile .
docker run -d --rm --name airflow -p 8080:8080 -v ./:/knowledge airflow standalone
```

Image that is used as an example configured in such way to create default user with login `user` and password `user` that will be used as credentials for the airflow server.

## Configuration file

The global configuration of the airflow is stored in the special file: `airflow.cfg`. Different installations put this file in different places (as usual). Typical locations are `/opt/airflow/airflow.cfg` and `~/airflow/airflow.cfg`.

---

Any way use following command to find the location of the `airflow.cfg` on your disk.

In [9]:
!find / -name airflow.cfg

/opt/airflow/airflow.cfg


## Adding a dag

This section shows the minimal actions needed to create an airflow dag. It is based on the [Fundamental conceptse official tutorial](https://airflow.apache.org/docs/apache-airflow/stable/tutorial/fundamentals.html), but shows only the minimal command to add a dag and verify that it works.

---

First you need to identify the folder containing dags. This folder is specified by the `dogs_folder` parameter of the configuration file.

In [1]:
!cat /opt/airflow/airflow.cfg | grep dags_folder

dags_folder = /opt/airflow/dags


DAG must be implemented by the special DAG file. This is the file that contains `airflow.models.DAG` object. It takes a lot of settings that determine it's behavior but in general it needs only `dag_id` to be specified. The following cell defines a dag with id `tutorial`.

In [2]:
%%writefile /opt/airflow/dags/tutorial.py
from airflow.models.dag import DAG

with DAG("tutorial") as dag:
    pass

Writing /opt/airflow/dags/tutorial.py


The following command causes airflow to add DAG to its databases.

In [3]:
!airflow db migrate

DB: sqlite:////opt/airflow/airflow.db
Performing upgrade to the metadata database sqlite:////opt/airflow/airflow.db
[[34m2025-03-22T14:14:21.698+0000[0m] {[34mmigration.py:[0m207} INFO[0m - Context impl [1mSQLiteImpl[22m.[0m
[[34m2025-03-22T14:14:21.700+0000[0m] {[34mmigration.py:[0m210} INFO[0m - Will assume [1mnon-transactional[22m DDL.[0m
[[34m2025-03-22T14:14:21.704+0000[0m] {[34mmigration.py:[0m207} INFO[0m - Context impl [1mSQLiteImpl[22m.[0m
[[34m2025-03-22T14:14:21.704+0000[0m] {[34mmigration.py:[0m210} INFO[0m - Will assume [1mnon-transactional[22m DDL.[0m
[[34m2025-03-22T14:14:21.705+0000[0m] {[34mdb.py:[0m1675} INFO[0m - Creating tables[0m
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Database migrating done!


With `airflow dags list` you can show DAGs that are seen by the airflow.

In [4]:
!airflow dags list

[1mdag_id  [0m[1m [0m|[1m [0m[1mfileloc                      [0m[1m [0m|[1m [0m[1mowners[0m[1m [0m|[1m [0m[1mis_paused[0m
tutorial | /opt/airflow/dags/tutorial.py |        | True     
[2;3m                                                             [0m


As a result there is a dag we have added below.